florinpatrascu / bolt_sips

Neo4j driver for Elixir
Apache License 2.0
256 stars 49 forks source link

Outgoing SSL connection hangs, cannot be dropped and re-established #95

Closed ekobi closed 2 years ago

ekobi commented 3 years ago

Precheck

Environment

Current behavior

My app loses, then is unable to re-establish, its SSL connection to my graphene database server instance. I first saw the failure from a Google Cloud-based Elixir hosting service (gigalixir), and again while running a local Elixir deployment on my workstation at around 5pm EST today. Typical error log message appended below.

One strong possibility is that the connection is getting dropped by AWS after being idle for 10 minutes. I sniffed for TCP keepalives, and although it's hard to make a generalized statement, the db_connection driver seems to send them out after the first 600s or so, which would be right at the threshold. Q: is it possible to directly configure the idle_timeout via Bolt-Sips? Any other thoughts on what's happening? Cheers,

Error log / Stack Trace

      (bolt_sips 2.0.8) lib/bolt_sips/protocol.ex:85: Bolt.Sips.Protocol.disconnect/2

  ** (MatchError) no match of right hand side value: %Bolt.Sips.Internals.Error{code: nil, connection_id: nil, function: :goodbye, message: "goodbye: Unknown failure: \"Can't close port\"\n", type: :protocol_error}

  19:47:31.105 [error] GenServer #PID<0.467.0> terminating

      (db_connection 2.2.2) lib/db_connection/connection.ex:136: DBConnection.Connection.disconnect/2

      (connection 1.0.4) lib/connection.ex:767: Connection.disconnect/3

      (stdlib 3.13) proc_lib.erl:236: :proc_lib.wake_up/3

      (stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4

      (stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6

  Last message: {:"$gen_cast", {:disconnect, #Reference<0.2335942036.437125121.151593>, %Bolt.Sips.Internals.Error{code: nil, connection_id: nil, function: :receive_data, message: "Port {:sslsocket, {:gen_tcp, #Port<0.11>, :tls_connection, :undefined}, [#PID<0.490.0>, #PID<0.480.0>]} is closed", type: :connection_error}, %Bolt.Sips.Protocol.ConnData{bolt_version: 3, configuration: [socket: :ssl, port: 24787, routing_context: %{}, schema: "bolt", hostname: "*************.graphenedb.com", pool_index: 1, name: {:via, Registry, {:bolt_sips_registry, "default_direct@***************.graphenedb.com:24787"}}, role: :direct, max_overflow: 0, timeout: 15000, with_etls: false, prefix: :default, url: "bolt://db-xyjg0kingtu1pztbekyz.graphenedb.com:24787", ssl: true, basic_auth: [username: '*********', password: "***********"], pool_size: 10, server_version: %{"connection_id" => "bolt-1006", "server" => "Neo4j/3.5.14"}], sock: {:sslsocket, {:gen_tcp, #Port<0.11>, :tls_connection, :undefined}, [#PID<0.490.0>, #PID<0.480.0>]}}}}

Expected behavior

florinpatrascu commented 3 years ago

Q: is it possible to directly configure the idle_timeout via Bolt-Sips?

Yes. Any config parameters you add to the driver and they are also supported by the underlining library DBConnection, will be used to initialize the pool. For example:

iex» {:ok, neo} = Sips.start_link(url: "bolt://neo4j:test@localhost", idle_interval: 2_000)

For brevity, when :idle_interval is set and if no requests are received for the idle interval specified, the pool will ping all stale connections which can then ping the database to keep the connection alive.

Please try and let us know if it works. It's been a while since I've worked with this setting :)

HTH

ekobi commented 3 years ago

Thanks for the hint, Florin.

Unfortunately the effect of applying the idle_interval setting is not readily obvious to me (from staring at WireShark traces for far too long :). I don't actually see keepalives emanating from my end, even though the connections remain stable for longer than before. Very puzzling. I'll let you know if I figure this out. Cheers,

kobi

On Sep 12, 2020, at 11:16, Florin notifications@github.com wrote:

Q: is it possible to directly configure the idle_timeout via Bolt-Sips?

Yes. Any config parameters you add to the driver and they are also supported by the underlining library DBConnection, will be used to initialize the pool. For example:

iex» {:ok, neo} = Sips.start_link(url: "bolt://neo4j:test@localhost", idle_interval: 2_000) For brevity, when :idle_interval is set and if no requests are received for the idle interval specified, the pool will ping all stale connections which can then ping the database to keep the connection alive.

Please try and let us know if it works. It's been a while since I've worked with this setting :)

HTH

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/florinpatrascu/bolt_sips/issues/95#issuecomment-691504797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAP4O5LEWJUGG3WYWJMCC3SFOGGFANCNFSM4RFISWAQ.