Optimize `save_load_state.py` for Performance and Efficiency Improvements - Githubissues

danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

https://danswer.ai

Other

10.83k stars 1.37k forks source link

Optimize `save_load_state.py` for Performance and Efficiency Improvements #2806

Open Mefisto04 opened 1 month ago

Mefisto04 commented 1 month ago

Issue Description:
The current implementation of save_load_state.py in danswer/backend/scripts can be improved for better performance and efficiency. Below are three key areas that need optimization:

1. Postgres Snapshot Handling:

Current Approach: Uses subprocess to execute shell commands for saving/loading the Postgres snapshot.
Suggested Change: Use a direct database connection with psycopg2 for more efficient interaction with Postgres, eliminating shell overhead.

2. Vespa Snapshot Handling:

Current Approach: Fetches documents sequentially with a continuation token.
Suggested Change: Implement batch processing to fetch multiple documents per request, reducing the number of HTTP requests.

3. Multi-threading for Vespa Operations:

Current Approach: Sequential processing of documents.
Suggested Change: Use multi-threading to handle document loading and saving in parallel, speeding up the process.

Please assign me this issue so that I can start working on it.

Mefisto04 commented 1 month ago

hey @pablodanswer, please review this and assign me this issue