Closed namespace-Pt closed 2 years ago
You're probably running out of memory. Since the SearchCollection
implementation is multi-threaded, it keeps the hits in memory until all the queries are processed, and the writes out to disk all at once. This simplifies thread synchronization.
Try running on smaller batches of queries.
I tried BM25 baseline for MSMARCO passage ranking and succeeded. The retrieving speed is about
0.001s/query
according to the terminal output whenhits=1000
.But when I was going to retrieve more queries (all the
55k
training queries) with the exact same index. I found the speed was becoming slower and slower until the entire program stuck at49.13% queries
. Why this could happen? I don't think it's reasonable that the retrieving speed is dragging down by larger query quantities.