Open johantandy opened 4 months ago
Hey, @johantandy!
I understand the problem and I think we can do something to handle it... (although, I am not a DB expert)
The only perplexity is about using a different package: psycopg_pool
.
Any other possible ideas? Or a draft of how you imagine the change?
(@silvanocerza)
Facing the same issue atm, but even for 5mins timeout.
Could be potentially use ? https://github.com/supabase/vecs
cc: @anakin87
Hey @anakin87 @silvanocerza just checking if this would worked on by the team or are you guys open to a PR
Sorry, I missed the discussion.
To be fair I don't see the benefit of a connection pool in this case, nor how it would solve the problem.
As of now the integration has at most one open connection and no more so why we need a pool of multiple connections?
The problem we have is that we open a single connection and keep it open as long as the Document Store exists. What we should do in my opinion is handle the connection context correctly, something that right we completely ignore.
If anyone wants to take care of that feel free to open a PR.
@octalpixel @johantandy any of you willing to take care of this using the connection context properly?
I would suggest taking a look at this first 👇 https://www.psycopg.org/psycopg3/docs/basic/usage.html#connection-context
This is up for grabs by anyone that want to fix how the connection is handled, see my above comment. ☝
@silvanocerza @julian-risch I would like to take up this issue.
Is your feature request related to a problem? Please describe. Yes, i'm currently using haystack pipeline & pgvector with FastAPI to serve embedding search but seems the current implementation of psycopg doesn't handle the reconnection after a long idle time Eg. overnight without any calls to API the exception that thrown is
psycopg.OperationalError: the connection is lost
Describe the solution you'd like i'd like to implement psycopg_pool to make it better in handling lost connection also for anyone who serve it via API this is more scaleable to use a connection pool.
Describe alternatives you've considered Increasing DB Connection max lifetime but seems not a good practice, make a retry call to PG also doesn't work with current implementation, re-initiate the connection are impossible because of the connection var never be None once it initialized https://github.com/deepset-ai/haystack-core-integrations/blob/60c666624cfb20a3600a28be074697fdf4cf2553/integrations/pgvector/src/haystack_integrations/document_stores/pgvector/document_store.py#L169-L173
Additional context I'd be happy to make a PR for this myself, just want to make sure that I'm not missing anything before working on this. open for discussion & also correct me if i'm wrong 🙇