castorini / pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
http://pygaggle.ai/
Apache License 2.0
329 stars 97 forks source link

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

Open farazkh80 opened 1 year ago

farazkh80 commented 1 year ago

Successfully reproduced the same numerical results for pygaggle/docs/experiments-msmarco-passage-subset.md on a Colab env with a T4 GPU.

Encountered a small issue with the python dependencies needed to evaluate using monoBERT.

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method seq_class_transformer \
                                                --model castorini/monobert-large-msmarco \
                                                --dataset data/msmarco_ans_small/ \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --task msmarco \
                                                --output-file runs/run.monobert.ans_small.dev.tsv

The error log was

2022-12-26 02:37:05.453924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-12-26 02:37:08 [INFO] utils: NumExpr defaulting to 2 threads.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/content/pygaggle/pygaggle/run/evaluate_passage_ranker.py", line 13, in <module>
    from pygaggle.rerank.base import Reranker
  File "/content/pygaggle/pygaggle/rerank/base.py", line 5, in <module>
    from pyserini.search import JLuceneSearcherResult
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/__init__.py", line 19, in <module>
    from .lucene import JLuceneSearcherResult, LuceneSimilarities, LuceneFusionSearcher, LuceneSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/__init__.py", line 18, in <module>
    from ._impact_searcher import JImpactSearcherResult, LuceneImpactSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/_impact_searcher.py", line 28, in <module>
    from pyserini.encode import QueryEncoder, TokFreqQueryEncoder, UniCoilQueryEncoder, \
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/__init__.py", line 17, in <module>
    from ._base import DocumentEncoder, QueryEncoder, JsonlCollectionIterator,\
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/_base.py", line 19, in <module>
    import faiss
ModuleNotFoundError: No module named 'faiss'

Fix:

pip install faiss-cpu
farazkh80 commented 1 year ago

Added "What's going on?" toggle blocks to illustrate the effect of re-ranking on the top hit's relevancy to a certain qid.

For each "What's going on?" toggle block

  1. Show the head of each generated run file
  2. Choose the first line of the run file
  3. Grep the qid and docid to show the actual corresponding text of the query and the passage
  4. Check the factual relevancy by retrieving the qrel files and checking if qid and docid appear as a match.
rodrigonogueira4 commented 1 year ago

Thanks for doing this! Could you please also add pip install faiss-cpu in the instructions?

farazkh80 commented 1 year ago

Added faiss-cpu installation!