JuliaDatabases / LibPQ.jl

A Julia wrapper for libpq
MIT License
216 stars 51 forks source link

Problems with connection timeouts and multihreadding #247

Open agcmoser opened 2 years ago

agcmoser commented 2 years ago

Hello,

I have a code where I write data into a Postgres db (using LibPQ.jl) which runs in parallel task. I had general issues with multithreading but hose could be solved, see https://discourse.julialang.org/t/no-real-multithreading-possible/77920.

Now I can observe that if there is a connection timeout to my Postgres-db make a libpq call taking longer, the thread calling libpq library is blocking all other threads. I am wondering if multithreading is only working correctly with pure julia code or if this is an issue with LibPQ

Any hint is welcome. TIA, agcm

iamed2 commented 2 years ago

There are a number of things that can cause blocking; a traceback from an interrupt when the blocking is happening would be useful to narrow it down. I'm also not sure how I would tell the difference between a blocking C library call and monopolizing a thread (which sounds like it can happen from the linked discussions?), without instrumenting LibPQ (which you could try on your machine).

One thing is if you're using a hostname, then DNS lookups can block: https://www.postgresql.org/docs/10/libpq-connect.html#LIBPQ-PARAMKEYWORDS

I'm afraid I haven't kept up with multithreading in Julia so I'm not sure if there's a difference between running on the "main thread" or another thread, and I don't know whether async code can monopolize a thread or something. I can say that, as is, LibPQ.jl connections should yield as appropriate while connecting as long as the restrictions are met and the standard library's poll_fd yields properly.

agcmoser commented 2 years ago

@iamed2: Thanks for the fast reply and especially for the hint with the DNS lookup because: we actually use two different Postgres Servers: one, which is accessible via the internet using and DNS and one accessible via a VPN extranet and its VPN IP. The one access via VPN and its IP so fare never had the described problem. The DNS one does. The two servers are not the same one and hosted by different provides (most probably also not the same version of Postgres server application) so we first though it be rather an issue of the different servers and not the DNS lookup. I will try accessing the DNS based server using its IP to verify that it is actually that ... but it seems like so.

iamed2 commented 2 years ago

To deal with this you can perform the DNS lookup in advance and cache it using the stdlib:

julia> using Sockets  # stdlib

julia> getalladdrinfo("google.com")  # all IPs, useful to handle failover
2-element Vector{IPAddr}:
 ip"142.250.191.110"
 ip"2607:f8b0:4009:817::200e"

julia> getaddrinfo("google.com")  # first IP
ip"142.250.191.110"

Then pass that as hostaddr instead of host in the DSN.

agcmoser commented 2 years ago

@iamed2: Good idea. I will incorporate that in my code and give feedback once it's active again and runs without errors (or if the error still exists)

agcmoser commented 2 years ago

I think I still have the same issue, also when using the hostaddr instead of the host. I rather think it's a timeout problem. In the documentation here I saw that the default value of the connection parameter connect_timeout means wait indefinitely ... I have now set connect_timeout=10 in the LibPQ connection string and will evaluate if things are better now.