Closed hubertlepicki closed 3 years ago
I would need a mechanism to reproduce it so I can investigate. I know it may not be easy though. Something you can try is to run the suite with altered timings: elixir --erl "+T 9" -S mix test
.
Alright @josevalim let me try this. Currently I can't even seem to reproduce it locally, only happens on CI which is obviously slower/less cores, that may be related issue too.
I didn't know about this flag either, thanks for the tip.
@josevalim I gave up for now, and just re-implemented the thing so we don't actually need to mock the module in every single test, and instead do dependency injection with a process that stores events on it's state, and match against that rather than use Mox.
I still think there mayb be some sort of race condition in Mox / ExUnit combo but I have no ways to even reproduce it myself.
Maybe I'll come back to this problem as a weekend project at some point but also maybe not. It's a dreadful task, but may be bothering me. Closing for now and thanks for your help
This issue still exists, we're bumping into it in our project.
We still need a way to reproduce it though so we can take a look. :)
I am also unable to reliably reproduce this issue, it mostly shows up in GitHub Actions CI. But here is my theory.
In our tests, we use Mox.stub_with/2
in the setup block of every test, meaning that mock expectations are defined very early-on in the test lifecycle. The tests are not asynchronous, and some of them put Mox into global mode.
Mox.set_mox_global/1
, which flags that test process as the owner of Mox global mode. The singleton Mox.Server
process is now monitoring this test process via Process.monitor
, so that it can later handle a :DOWN
message, which will set Mox back to private mode when that test process ends.:DOWN
message to Mox.Server
is delayed, which means that Mox is currently still in global mode.Mox.stub_with/2
, but Mox.Server
is still in global mode with the Test A's process as the owner, so it explodes.Call Mox.set_mox_private/1
in the very beginning of the setup
block of the base test case, to ensure that the state of Mox.Server
is settled before trying to set any expectations.
@schrockwell I think you got jackpot. I believe there is no guarantee all DOWN messages are delivered at the same time. I have pushed a fix based on your assumption, please try master out!
I am also unable to reliably reproduce this issue, it mostly shows up in GitHub Actions CI. But here is my theory.
In our tests, we use
Mox.stub_with/2
in the setup block of every test, meaning that mock expectations are defined very early-on in the test lifecycle. The tests are not asynchronous, and some of them put Mox into global mode.1. Test A calls `Mox.set_mox_global/1`, which flags that test process as the owner of Mox global mode. The singleton `Mox.Server` process is now monitoring this test process via `Process.monitor`, so that it can later handle a `:DOWN` message, which will set Mox back to private mode when that test process ends. 2. Test A completes and its process ends, but for some reason the `:DOWN` message to `Mox.Server` is delayed, which means that Mox is currently still in global mode. 3. Test B begins, and it is expecting Mox to be in private mode. It attempts to set up a stub with `Mox.stub_with/2`, but `Mox.Server` is still in global mode with the Test A's process as the owner, so it explodes.
The hacky(?) fix
Call
Mox.set_mox_private/1
in the very beginning of thesetup
block of the base test case, to ensure that the state ofMox.Server
is settled before trying to set any expectations.
Thank you @schrockwell your "hacky" approach literally solved our problem in a macro which calls stubs_with to every modules in our CI tests
@josevalim I tried to replicate the problems as you suggested elixir --erl "+T 9" -S mix test
but couldn't reproduce. However when I stressed my CPU enough (using stress --cpu 16
I have a 8 cores CPU with 16 threads) which before running them makes the CPU really high (mine was about 100% :sweat_smile:) it did happen. Being speculative here, depending on how the container is provisioned in GH Actions, the CPU constraint can happen and affect async testing. That's why @schrockwell approach solved our problem.
Well, being on point, I will try your commit in master seeing whether this solves the problem. Tho maybe our approach on how to define this macro for testing is wrong. Sorry starting in Elixir this week and have a lot to learn :sweat_smile:
We've been using the HEAD version of Mox for a week or so and haven't seen this error since the upgrade. Thank you @schrockwell and @josevalim!
v1.0.1 released.
I have a large project that uses Mox for some light mocking here and there, and the test suites cases (it's an umbrella project) are setting up some stubs in setup block to be globally available in all tests. For example:
Each app in umbrella project has at least one of such files that set up ExUnit tests, often more, but they are all identical when it comes to setting up Mox: we
:set_mox_from_context
,:verify_on_exit!
and thenstub_with
some modules.The thing is, I am getting random test failures like this one, only on CI (GH Actions), never locally:
The test failures seem random but always happen with tests that have async: false.
Out of 900 tests, I get like one or two failures with the above, maybe once every 10 times the suite runs on CI. I can't seem to reproduce it locally.
What I suspect is happening is some sort of race condition in Mox, where
Mox.Server
believes it'sglobal_owner_pid
still points to the previous test PID.But this is weird as
set_mox_from_context
is added as a setup block, that I think should execute before the block that adds stubs, and in the current process.Elixir 1.11.2 Erlang 23.1 Mox 1.0.0
Any ideas what may be wrong? I don't see anything wrong with Mox itself although I think it's likely there something is not quite right.