florinpatrascu / bolt_sips

Neo4j driver for Elixir
Apache License 2.0
256 stars 49 forks source link

Bolt.Sips.Protocol disconnected: ** (DBConnection.ConnectionError) #83

Closed meerifan89 closed 4 years ago

meerifan89 commented 4 years ago

Hey guys!

We have a quite heavy query and we are running in a timeout.

[error] Bolt.Sips.Protocol (#PID<0.394.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.627.0> timed out because it queued and checked out the connection for longer than 15000ms

Further we get this message:

Port #Port<0.7> is closed

We did set the timeout in the config to 60s, but this did not solve the issue. We checked and we could confirm that the timeout is loaded correctly when starting the app.

We also wrote a test query to simulate our heavy query.

Here it is:

MERGE (u:Stuff) FOREACH(i IN range(0, 99999) | SET u.stuff = i) DELETE (u)

We run into the same issue when running it from iex and we noticed that it actually takes around 15s to break.

When running this query in the neo4j browser it will just do as its told until we stop the query. Thats why we came to the conclusion that something is going wrong on how we are using your library. Can someone of you help us clarifying what we might missed?!

Cheerio!

dominique-vassard commented 4 years ago

Hi, thanks for your interest in bolt_sips.

Quick check: Are you using the last version of bolt_sips(2.0.4) as a recent PR was about setting timeout in options. -> https://github.com/florinpatrascu/bolt_sips/pull/82

If you are using this version, then we have a bug and will happily fix it :)

kristofka commented 4 years ago

Could you try passing the timeout option to the query eg : Bolt.Sips.query!(conn, "MERGE (u:Stuff) FOREACH(i IN range(0, 99999) | SET u.stuff = i) DELETE (u)",%{}, timeout: 60_000) with the latest version?

@florinpatrascu , @dominique-vassard : I'd like to investigate this if it's ok with you (I'll have a little time in the next few days).

florinpatrascu commented 4 years ago

absolutely @kristofka, thank you for help!

meerifan89 commented 4 years ago

Thanks for your really fast reply. Jep, we are using this version.

# mix.exs
{:bolt_sips, "~> 2.0.4"}
meerifan89 commented 4 years ago

So I tried your suggestion with the timeout given into the query function. It does not have any effect. It takes nearly the same amount of time until it results in a timeout.

Maybe I should add, that we are using Bolt.Sips.transaction to execute the query in our code base. This might has an effect on how to approach the issue.

Thanks for your help!

kristofka commented 4 years ago

I believe I have identified the issue, I need to do a bit more work and testing and will hopefully have a pull request ready in a couple of days.

kristofka commented 4 years ago

@meerifan89 Could you try against master, passing the timeout to your query eg Bolt.Sips.query!(conn, "MERGE (u:Stuff) FOREACH(i IN range(0, 99999) | SET u.stuff = i) DELETE (u)",%{}, timeout: 60_000) ? @florinpatrascu has merged a request that should fix the issue. Thank you.

florinpatrascu commented 4 years ago

@meerifan89 - you can use it directly from Hex.pm

This release is introducing a series of optimizations that are useful beyond the fix of this particular case, many thanks to @kristofka, for this!

But yes, please test first and then feel free to close this issue.

Thank you for reporting the bug and for your feedback!

meerifan89 commented 4 years ago

works perfectly fine! Thanks a lot for the quick fix!

sascha-wolf commented 4 years ago

I still seem to run in said issue - on the latest master version - using this query:

MERGE (u:Stuff) FOREACH(i IN range(0, 999999999) | SET u.stuff = i) DELETE (u)

Error

[error] Bolt.Sips.Protocol (#PID<0.1650.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.1695.0> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.1695.0> was at location:

    :prim_inet.recv0/3
    (bolt_sips) lib/bolt_sips/internals/bolt_protocol_helper.ex:83: Bolt.Sips.Internals.BoltProtocolHelper.do_receive_data/3
    (bolt_sips) lib/bolt_sips/internals/bolt_protocol_helper.ex:56: Bolt.Sips.Internals.BoltProtocolHelper.receive_data/5
    (bolt_sips) lib/bolt_sips/internals/bolt_protocol_v3.ex:94: Bolt.Sips.Internals.BoltProtocolV3.run/7
    (bolt_sips) lib/bolt_sips/internals/bolt_protocol_v3.ex:144: Bolt.Sips.Internals.BoltProtocolV3.run_statement/7
    (bolt_sips) lib/bolt_sips/protocol.ex:169: Bolt.Sips.Protocol.execute/4
    (db_connection) lib/db_connection/holder.ex:316: DBConnection.Holder.holder_apply/4
    (db_connection) lib/db_connection.ex:1255: DBConnection.run_execute/5
    (db_connection) lib/db_connection.ex:1342: DBConnection.run/6
    (db_connection) lib/db_connection.ex:595: DBConnection.execute/4
    (bolt_sips) lib/bolt_sips/query.ex:146: Bolt.Sips.Query.send!/5
    (bolt_sips) lib/bolt_sips/query.ex:135: Bolt.Sips.Query.tx!/4
    (bolt_sips) lib/bolt_sips/query.ex:111: Bolt.Sips.Query.query_commit/4
    (bolt_sips) lib/bolt_sips/query.ex:58: Bolt.Sips.Query.query!/4
    (medics) lib/medics/graph.ex:80: Medics.Graph.do_run/4
    (medics) lib/medics/graph.ex:63: Medics.Graph.run/3
    (stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
    (elixir) src/elixir.erl:275: :elixir.eval_forms/4
    (iex) lib/iex/evaluator.ex:257: IEx.Evaluator.handle_eval/5
    (iex) lib/iex/evaluator.ex:237: IEx.Evaluator.do_eval/3

** (Bolt.Sips.Exception) Port #Port<0.2182> is closed
    (elixir) lib/process.ex:767: Process.info/2
    (bolt_sips) lib/bolt_sips/query.ex:117: Bolt.Sips.Query.query_commit/4
    (bolt_sips) lib/bolt_sips/query.ex:58: Bolt.Sips.Query.query!/4
    (medics) lib/medics/graph.ex:80: Medics.Graph.do_run/4
    (medics) lib/medics/graph.ex:63: Medics.Graph.run/3
sascha-wolf commented 4 years ago

After drilling down into this for over an hour I was able to "fix" this by explicitly passing a timeout to Bolt.Sips.query which then gets dragged along to DBConnection.execute.

I'm quite confused though that the options passed to DBConnection.execute are empty by default and in no way impacted by what configuration was set using config :bolt_sips, Bolt.

In my opinion this is either a bug, or a glaring oversight in the documentation. I tend to bug, because my expectation certainly was that my configured timeout applies to queries.

kristofka commented 4 years ago

I’m sorry about this. The timeout configuration is the time after which a connection attempt will fail. I thought that changing the language of the configuration guide ( https://hexdocs.pm/bolt_sips/configuration.html#content ) to say timeout : connection timeout and documenting the behavior of Bolt.Sips.query/4 ( https://hexdocs.pm/bolt_sips/Bolt.Sips.html#query/4 ) would be explicit enough, obviously I was wrong. @florinpatrascu maybe we should add a connect_timeout and timeout option to make things more explicit. I can probably do that by the end of the week if you think it may help.

florinpatrascu commented 4 years ago

The timeout param from BoltSips is passed to the underlying DBConnection lib, when we start. That timeout should be the maximum time that the caller is allowed to perform the operation (default: 15_000)

The reason, the query options are empty, is not a bug. They were introduced to allow the user to override the driver initial config, with a per request values ie overriding the timeout value. They are empty because DBConnection was already initialized with the values specified in BoltSips’ config - hence no need for redundancy.

@kristofka - we can influence the settings DBConnection is using, by declaring its accepted values. Anything that’s recognized as a config param by the DBConnection can also be declared in BoltSips, in the config. So I don’t think we should create our own definitions, as they’ll just confuse the user. Aren’t DBConnection’s config params sufficient? Unless we have a bug and they re not probably propagated?!

Trying to understand what is the best course of actions here.