Issue Description:
The current implementation of save_load_state.py in danswer/backend/scripts can be improved for better performance and efficiency. Below are three key areas that need optimization:
1. Postgres Snapshot Handling:
Current Approach: Uses subprocess to execute shell commands for saving/loading the Postgres snapshot.
Suggested Change: Use a direct database connection with psycopg2 for more efficient interaction with Postgres, eliminating shell overhead.
2. Vespa Snapshot Handling:
Current Approach: Fetches documents sequentially with a continuation token.
Suggested Change: Implement batch processing to fetch multiple documents per request, reducing the number of HTTP requests.
3. Multi-threading for Vespa Operations:
Current Approach: Sequential processing of documents.
Suggested Change: Use multi-threading to handle document loading and saving in parallel, speeding up the process.
Please assign me this issue so that I can start working on it.
Issue Description:
The current implementation of
save_load_state.py
indanswer/backend/scripts
can be improved for better performance and efficiency. Below are three key areas that need optimization:1. Postgres Snapshot Handling:
subprocess
to execute shell commands for saving/loading the Postgres snapshot.psycopg2
for more efficient interaction with Postgres, eliminating shell overhead.2. Vespa Snapshot Handling:
3. Multi-threading for Vespa Operations:
Please assign me this issue so that I can start working on it.