AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Apache License 2.0
634 stars 62 forks source link

Error : "Document ID 0 with page ID 1 already exists in the index" #31

Open Leflak opened 2 months ago

Leflak commented 2 months ago

Hi, the error "Document ID 0 with page ID 1 already exists in the index" happens when I create an index with the same files as a previous one, even with overwrite=True.

Deepseek helped me, added following lines: self.embed_id_to_doc_id = {} self.indexed_embeddings = [] self.doc_ids_to_file_names = {} self.doc_id_to_metadata = {} self.highest_doc_id = -1

.. after the line 317 of colpali.py "shutil.rmtree(index_path)". This seems to allow to really delete existing index in memory and not just the folder.

Sorry if not proper way to raise that I am not a dev and do no understand anything to github.