Georgetown-IR-Lab / OpenNIR

An end-to-end neural ad-hoc ranking pipeline.
https://opennir.net
MIT License
150 stars 25 forks source link

Indexing COVID dataset painfully slow by end #10

Open seanmacavaney opened 4 years ago

seanmacavaney commented 4 years ago

Introduced in #8

Indexing starts out at a decent rate (after files are downloaded and extracted), but slows down heavily over time. I suspect it's due to the MultifieldSqliteDocstore needing to insert frequently. Maybe it's time to replace this method of storing document fields with something more robust.