Test executable fails only when run under bazel test

f9rocket commented 6 years ago

Description of the problem / feature request:

I have a unit test that calls certain functions in a DLL. When I run this unit test under bazel run or as a standalone executable the unit test passes. When I try to run the unit test through bazel test the test fails and only partial test output is produced.

Sample output:

> bazel run //lib/thing:my_test  OR  ..\bazel-out\...\my_test.exe
...
INFO: Running command line: .../my_test.exe
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MyTest
[ RUN      ] MyTest.Test
[       OK ] MyTest.Test (3053 ms)
[----------] 1 test from MyTest (3054 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3058 ms total)
[  PASSED  ] 1 test.

But when I run it using bazel test:

INFO: Analysed target //lib/thing:MyTest (0 packages loaded).
INFO: Found 1 test target...
FAIL: //lib/thing:MyTest (see C:/.../my_test/test.log)
INFO: From Testing //lib/thing:my_test
==================== Test output for //lib/thing:my_test:
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MyTest
[ RUN      ] MyTest.Test
================================================================================
...
//lib/thing:my_test                                            FAILED in 1.8s
  C:/.../my_test/test.log

I suspected gtest was behind this so I removed all gtest code/assertions from the test, but it still fails. As far as I can tell the only difference between bazel run and bazel test is that run launches an actual subprocess while test uses a bash wrapper.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I understand that with the given information this is almost impossible to reproduce, but I am very limited as to sharing code. What I am looking for here is mostly:

1) Visibility: perhaps someone has had the same or similar issue. 2) Debugging help: any tips/ideas on how to debug bazel test or its bash wrapper.

What operating system are you running Bazel on?

Windows

What's the output of `bazel info release`?

development version

If `bazel info release` returns "development version" or "(@non-git)", tell us how you built Bazel.

I built it from the branch release-0.10.0.

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

https://github.com/bazelbuild/bazel.git 96c654d43eb2906177325cbc2fc2b1e90dbcc792 22c2f9a7722e8c8b7fdf8f5d30a40f1c4118e993

Have you found anything relevant by searching the web?

I have not.

Any other information, logs, or outputs that you want to share?

See above.

f9rocket commented 6 years ago

I manually ran the test exe inside msys2 and that also causes the faulty behavior, so it seems like the issue is inherent to msys2 or its bash shell.

laszlocsomor commented 6 years ago

Thanks for reporting the bug and sharing as much as you could. I think we can try blind debugging.

My first thought was, perhaps your PATHs are different in the different scenarios and maybe the test fails to properly load the DLL. Try printing PATH and comparing the values in the different run scenarios.

To generalize on that idea, try replacing your entire test code with a simple main function that prints the environment and attempts to load the DLL and call a basic function in it. Compare results with "bazel test", "bazel run", and direct execution.

I'm curious to know the outcome.

f9rocket commented 6 years ago

@laszlocsomor Thanks for getting back to me. I suspected the PATHs at first, but the DLL resides under c:\windows\system32 so that should not be a problem. I confirmed this by printing the DLL path from inside the unit test (using GetModuleFileName). When runnning under bazel test or directly under the command line the loaded path is always c:\windows\system32\....dll.

In my previous comment I said that I ran the test inside msys2 and that I got the same result, this was slightly incorrect and I wasn't very clear: I invoked c:\msys2\usr\bin\bash.exe directly from the Windows command prompt (this is similar to what bazel does, except for the test wrapper script) and when I attempted to run the test it failed in the same way. However, just now I ran the executable inside msys2 proper (I opened msys2 by first running the MSYS2 MinGW 64-bit shortcut) and this time the test passed! So it seems that something in the MSYS2 msys2_shell.cmd script has an effect on the test.

laszlocsomor commented 6 years ago

What exactly is failing, the call in the DLL or the test framework? If the former, maybe the DLL is trying to do something that it can't when the test is running as a a Bash grandchild?

I know this is going to be very general and basic advice, but can you gradually remove all test cases and keep just the one that fails, then gradually remove the body of that test method to find exactly what's failing? I often debug this way.

f9rocket commented 6 years ago

I removed the test framework as you suggest and pared it down to just one DLL call that fails. I was able to attach a debugger to the test binary when running under bazel test and a "heap is corrupted" exception is being thrown inside winnt.dll. This exception is not thrown when running from the command line or using bazel run. Unfortunately I can't tell what is triggering the exception due to lack of symbols.

The DLL itself is part of a driver SDK and it is used to configure special hardware. I don't know if this is something that should not be done as a bash grandchild but it would be unfortunate if that was the case.

At this point I'm considering writing my own wrapper to use bazel query to discover the unit tests affected by this and run them outside of bazel, but I don't really want to do this since it would split our unit tests into two categories: runnable inside bazel and not runnable.

bazelbuild / bazel