Closed michallepicki closed 3 years ago
The only process that we monitor in the manager is the owner process, so if you are going into that branch, is because the owner process is terminating too.
For now, I think that makes sense and here is why: a process exited while it was using the connection. Because of this, we have no idea what is the connection state. Maybe it wrote some bytes to it? Maybe there is a left over on the writer buffer? We have no other option other than close the connection and kill its owner, because we can't do anything else with it reliably.
Right, so it's correct that the ref is used, makes sense, thank you! I should look into why the request handling process exited. Unfortunately I don't see any reason in logs, I hope it's not something like cowboy killing the process because the browser aborted request...
Again, thanks!
I hope it's not something like cowboy killing the process because the browser aborted request...
It might be the case... but also note it has to happen while the client is actively using the connection. I.e. doing a query or inside a transaction. You can try doing a Process.flag(:trap_exit, true)
in the request process and see how it changes things.
It does look like this is the case. I added additional slow SQL queries to the graphql query in which the request handling process was usually exiting and I can now reliably fail the test. So to "fix" it I probably need to wait for the page to become stable and finish receiving responses for the not-critical graphql requests as well.
Integration testing is hard, maybe there's room for improvement here to configure cowboy so that it doesn't kill the process for aborted request so abruptly, but that seems outside of ecto ecosystem.
Hi @michallepicki, I found this issue after I filed a very similar report: https://github.com/elixir-ecto/db_connection/issues/247
@josevalim has now committed a fix to Phoenix.Ecto.SQL.Sandbox
which does the :trap_exit
dance automatically: https://github.com/phoenixframework/phoenix_ecto/commit/1d8d28a82eef1ff496851bd2fd53661398d38b99
I have a Phoenix + Absinthe application with a React+Apollo frontend that we run integration tests on using ExUnit and Wallaby. Tests are unfortunately "flaky" because of a random
DBConnection.OwnershipError
and I think this could be an issue indb_connection
.Because we didn't manage yet to set up all processes to automatically checkout properly in sandbox (like e.g. some Absinthe PubSub processes), all integration tests are
async: false
so we're using shared mode, usingstart_owner!
in setup as documented in phoenix master:I logged the test's
self()
PID which is:#PID<0.2936.0>
and the owner (pid
in the above snippet) is#PID<0.2937.0>
There's a lot happening when the test is clicking quickly through the app, and sometimes this log shows up, which is (I think?) harmless:
or at least I don't see any other error that would suggest we have a bug in our app that causes it. The Ecto sandbox docs explain that the
owner
exiting could cause problems, and here theclient
exits.This seems fine but after that error I found that the
DBConnection.Ownership.Manager
receives this message:so it runs this code into this code and switches from mode
{:shared, #PID<0.2937.0>}
to:manual
despite the fact that it's not the owner process that was just downed.Afterwards this log is shown for some next request:
and the Wallaby test fails trying to parse the phoenix error page as json.
Question
When mode is
shared
, shouldn't theunshare
pattern match be based on the owner process PID and not mode_ref?