Closed lintool closed 2 years ago
We can now do this in pyserini
>>> from pyserini.search import SimpleSearcher >>> searcher = SimpleSearcher.from_prebuilt_index('trec45') Downloading index at https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-robust04-20191213.tar.gz... index-robust04-20191213.tar.gz: 1.70GB [00:50, 36.4MB/s] Extracting /Users/jimmylin/.cache/pyserini/indexes/index-robust04-20191213.tar.gz into /Users/jimmylin/.cache/pyserini/indexes/index-robust04-2019121315f3d001489c97849a010b0a4734d018... >>> searcher <pyserini.search._searcher.SimpleSearcher object at 0x7fee58547ac8> >>> hits = searcher.search('hubble space telescope') >>> >>> # Print the first 10 hits: ... for i in range(0, 10): ... print(f'{i+1:2} {hits[i].docid:15} {hits[i].score:.5f}') ... 1 LA071090-0047 16.85690 2 FT934-5418 16.75630 3 FT921-7107 16.68290 4 LA052890-0021 16.37390 5 LA070990-0052 16.36460 6 LA062990-0180 16.19260 7 LA070890-0154 16.15610 8 FT934-2516 16.08950 9 LA041090-0148 16.08810 10 FT944-128 16.01920
Instead of downloading the indexes by hand, take advantage of this feature?
cc/ @MXueguang @qguo96
sure!
I did something similar in Bertserini, but it may be better to let Pyserini do this now.
A PR(https://github.com/rsvp-ai/bertserini/pull/10) to solve this.
We can now do this in pyserini
Instead of downloading the indexes by hand, take advantage of this feature?
cc/ @MXueguang @qguo96