AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Apache License 2.0
626 stars 60 forks source link

index() corruption #65

Open declanraj opened 6 days ago

declanraj commented 6 days ago

Hi, I have been trying to run the indexing on a set of 80 pdf documents (~150 pages each) by submitting batch jobs. Since the indexing took longer than expected (8 hours) my session ended abruptly and I get a "ValueError: Expected object or value" when I try to read from_index().

I don't see any method to discard the partially indexed document and continue from the last valid index. This would mean I need to start from the top for another 8+ hours. Is it possible to have some functionality to deal with this situation?