Currently the vector database is created initially from all files in the project/git repository. It should be possible to update the vector database based on the latest git changes. Only the vectors in the database that are related to a file that has changed since the last vector creation should be updated.
Possible solution:
Save for every file the commit hash of the last change in the cache. At sync compare the saved commit hash with the current hash of the file. If the hash differs delete any vector related to the file in the database. Save filename in the metadata and query by filename if possible. If not possible, maintain for every file a list of faiss id's that gets deleted if the file has a new git hash and insert the new vectors to faiss.
Currently the vector database is created initially from all files in the project/git repository. It should be possible to update the vector database based on the latest git changes. Only the vectors in the database that are related to a file that has changed since the last vector creation should be updated.
Possible solution: Save for every file the commit hash of the last change in the cache. At sync compare the saved commit hash with the current hash of the file. If the hash differs delete any vector related to the file in the database. Save filename in the metadata and query by filename if possible. If not possible, maintain for every file a list of faiss id's that gets deleted if the file has a new git hash and insert the new vectors to faiss.
chache json
https://github.com/langchain-ai/langchain/issues/2699#issuecomment-1618163649