AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Apache License 2.0
626 stars 60 forks source link

Improved performance and lower memory usage during PDF indexing #24

Closed velaia closed 2 months ago

velaia commented 2 months ago

This is an update version of PR #19. Besides CPU-parallel pdftoppm images are buffered using tempfile instead of in memory. For large PDFs I have measured significantly lower memory usage (8 GB instead of 16 GB) during indexing.

More context under #19

velaia commented 2 months ago

Great. Thank you! 😃