Clarify error message when DB adapter app isn't running?

henrik commented 8 years ago

I got this very cryptic (to me) error because I missed starting postgrex: http://stackoverflow.com/questions/38010058/shutdown-failed-to-start-child-dbconnection-ownership-manager-after-updating

José got this error, seemingly for similar reasons: https://github.com/elixir-ecto/ecto/issues/1491

Do you think it would be technically feasible and reasonable to improve the error messages in situations like this? As an Elixir newbie, I feel cryptic errors are one of the rougher edges of the ecosystem.

EDIT: Or is this maybe something that Postgrex could/should improve, rather than db_connection? I guess what's happening is roughly that some Postgrex function is called and exits cryptically in the absence of certain running Postgrex processes?

josevalim commented 8 years ago

Paging @fishcakez, he would know best. :)

fishcakez commented 8 years ago

This error is because :db_connection has not been started. Starting Postgrex will start DBConnection.

The "no process" error crops up all the time. I am really unsure which part of the linked error isn't clear. I don't mean that in a patronising way, its just that because I have seen the error so many times that I know exactly what it means. I think I was even involved in creating these errors. If these errors aren't clear enough this needs to be solved in Exeption.format/3 etc.

First line: The error in the current process: #PID<0.46.0> exit because it (a Supervisor) failed to start child with id DBConnection.Ownership.Manager:

** (EXIT from #PID<0.46.0>) shutdown: failed to start child: DBConnection.Ownership.Manager

Second line: The error why the child process failed to start: exit is marked as occurring during the function call GenServer.Call(DBConnection.Ownership.PoolSupervisor, ..) with the arguments shown

** (EXIT) exited in: GenServer.call(DBConnection.Ownership.PoolSupervisor, {:start_child, [#PID<0.175.0>, Postgrex.Protocol, [pool: DBConnection.Poolboy, types: true, hostname: "localhost", types: true, otp_app: :ectoo, repo: Ectoo.Repo, adapter: Ecto.Adapters.Postgres, database: "ectoo_test", username: "henrik", pool_timeout: 5000, timeout: 15000, adapter: Ecto.Adapters.Postgres, database: "ectoo_test", username: "henrik", extensions: [{Ecto.Adapters.Postgres.DateTime, []}, {Postgrex.Extensions.JSON, [library: nil]}], port: 5432]]}, :infinity)

Third line the reason for the exit: "no process" - this process is dead or no process associated with the name. If it wasn't "no process" but another error, this error could be occurring in a third process.

** (EXIT) no process

@henrik, your help in showing us how to improve this would be greatly appreciated. Is it just "no process" that doesn't make sense? We can expand it to a longer phrase such as "no process: either the process is dead or a process is not associated with a given name". The only information we have to create this line is the atom :noproc, that is just how Erlang/OTP does things.

If it is a case of information overload, the error is always going to contain A LOT of information because the error is occurring a few process down the tree. I am not sure there is anything we would want to take away.

I am also a bit confused by the statement that Postgrex is only used when running tests. If using the postgres Ecto adapter then isn't Postgrex always being used?

henrik commented 8 years ago

@fishcakez I'll think about this (hopefully tomorrow) and try to pinpoint what parts I found non-obvious and see if I can think of any way to address that.

Until then, I just wanted to say how much I appreciated your reply! Thank you so much. Having the backtrace explained line by line in this way was fantastically helpful, and it's something I'm going to try to do more of from now on both as a learner and as a teacher.

henrik commented 8 years ago

So I think part of what made it harder to understand for me is that I don't see the path to my own code in the backtrace. I'm used to reading Ruby backtraces where I will typically see that my_file.rb line 123 called some_lib.rb line 456 and so on, with a line-by-line chain from the code I ran to the error.

In this case, the topmost thing I see is

** (EXIT from #PID<0.46.0>) shutdown: failed to start child: DBConnection.Ownership.Manager

which is some distance away from my code – it mentions not one of my modules, and not a dependency I added explicitly, but an indirect dependency of my app.

So if the error message could connect that chain, I think that could help. Something very roughly along the lines of:

test/test_helper.exs:1: Ectoo.Repo.start_link()
Ectoo.Repo starts Postgrex
Postgrex starts DBConnection
some specific supervisor in file X (or in module Y) failed to start child: DBConnection.Ownership.Manager
…

But preferably more detailed, mentioning specific files or modules, and functions. Maybe this isn't feasible or idiomatic – just trying to put my finger on what made/makes it hard to figure out for me. I can also see how there would be a trade-off of too much vs. too little detail. Maybe some sort of verbosity flag could help with that.

Also, it took a number of re-reads until it clicked for me that GenServer.call(DBConnection.Ownership.PoolSupervisor, …) means it's trying to communicate with a process named DBConnection.Ownership.PoolSupervisor, and that's what the "no process" error is about. That's just lack of familiarity with the language and patterns on my part. Though if the error message could somehow say "no process named Blah.Blah – it's dead or has another name" that would have helped enormously. But I take it from your comment that we don't have access to the name?

If "no process" errors are commonly due to forgetting to list something in applications in mix.exs, I wonder if it could help to say so. "no process (sometimes this is due to forgetting to list an app under blabla in mix.exs, or to start it manually)" Long but possibly helpful. Could also be misleading if the error often happens for other reasons, though.

(About Postgrex only being used when running tests: that's because this is a convenience library on top of Ecto, so I need some specific DB adapter to integration test it, but I want the user of the library to be able to pick another adapter.)

fishcakez commented 8 years ago

** (EXIT from #PID<0.46.0>) <- this means the #PID<0.46.0> got an exit signal and abruptly stopped so we don't have access to anything in #PID<0.46.0> because its just died.

GenServer.call(DBConnection.Ownership.PoolSupervisor, …) this information is given as {mod, fun, args} but we can't make any assumption about what args are, so the name in the function call is there to see. However we can't add that name argument anywhere else in error.

So we can't do no process named Blah.Blah but we can do add some information.

"no process (sometimes this is due to forgetting to list an app under blabla in mix.exs, or to start it manually)"

However mix.exs may not be correct suggestion and would be an invalid warning in a lot of cases such that it could due more harm than good. We could do something like:

"no process: the process is dead or no process is associated with the given name, possibly because its application isn't started"

josevalim commented 8 years ago

@fishcakez i agree, the minimum we can do at this point is to improve the no process exit message.

henrik commented 8 years ago

Thank you again for taking the time discussing this.

I agree as well. Seems like a good idea to tweak that message. If none of you get to it first (feel free) I might give it a shot when I find some time.

fishcakez commented 8 years ago

@henrik we eagerly await your PR to elixir :wink:

henrik commented 8 years ago

Opened a PR! Closing this ticket. Thanks so much again, @fishcakez!

elixir-ecto / db_connection

Clarify error message when DB adapter app isn't running? #48