Open ksadov opened 4 months ago
Hey, thanks for flagging
@jlscheerer @Anmol6, would you have a minute to take a look? It seems like there's some sort of indexing issue somewhere with the CRUD functionalities in colbert
🤔
Seems like it could be an issue in the colbert repo actually, https://github.com/stanford-futuredata/ColBERT/issues/261 looks related
As I mentioned in the stanford-futuredata/ColBERT#261, the problem is IndexUpdater.persist_to_disk
updates only embedding vectors. Colbert's index folder has collection.json
file in which all the docs are saved. IndexUpdater.persist_to_disk
should also update those collections, after updating, the index searcher should be updated with the latest collection.
After pulling in the most recent change, when I run the following script:
The first add_to_index() call in the loop successfully indexes and retrieves the given document. However, the second call results in the following error:
This is with Python 3.9.13 and faiss-gpu 1.7.3, though it seems to me like the error is caused by a failure to update some mapping or search struct internal to the RAGatouille library and not one of its dependencies.