castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.57k stars 349 forks source link

Improve reporting of BEIR results #1888

Closed lintool closed 2 months ago

lintool commented 2 months ago

Run this:

python -m pyserini.2cr.beir --all --display-commands --dry-run

And you'll get this:

                              BM25-flat          BM25-mf             SPLADE           Contriever     Contriever-msmarco  BGE-base-en-v1.5   cohere (en-v3.0)
                          nDCG@10   R@100    nDCG@10   R@100    nDCG@10   R@100    nDCG@10   R@100    nDCG@10   R@100    nDCG@10   R@100    nDCG@10   R@100    
                           --------------     --------------     --------------     --------------     --------------     --------------     --------------
trec-covid                  0.595   0.109      0.656   0.114      0.727   0.128      0.273   0.037      0.596   0.091      0.781   0.141      0.818   0.159
bioasq                      0.522   0.769      0.465   0.715      0.498   0.739      0.302   0.541      0.383   0.607      0.415   0.632      0.457   0.679
nfcorpus                    0.322   0.246      0.325   0.250      0.347   0.284      0.317   0.294      0.328   0.301      0.373   0.337      0.386   0.351
nq                          0.305   0.751      0.329   0.760      0.538   0.930      0.254   0.771      0.498   0.925      0.541   0.942      0.616   0.956
hotpotqa                    0.633   0.796      0.603   0.740      0.687   0.818      0.481   0.705      0.638   0.777      0.726   0.873      0.707   0.823
fiqa                        0.236   0.539      0.236   0.539      0.347   0.631      0.245   0.562      0.329   0.656      0.406   0.742      0.421   0.736
signal1m                    0.330   0.370      0.330   0.370      0.301   0.340      0.234   0.257      0.278   0.322      0.289   0.311      0.263   0.283
trec-news                   0.395   0.447      0.398   0.422      0.415   0.441      0.348   0.423      0.428   0.492      0.442   0.499      0.504   0.543
robust04                    0.407   0.375      0.407   0.375      0.468   0.385      0.316   0.276      0.473   0.392      0.444   0.351      0.541   0.417
arguana                     0.397   0.932      0.414   0.943      0.520   0.974      0.379   0.901      0.446   0.977      0.636   0.992      0.540   0.982
webis-touche2020            0.442   0.582      0.367   0.538      0.247   0.471      0.167   0.374      0.204   0.442      0.257   0.487      0.326   0.516
cqa                         0.302   0.580      0.299   0.606      0.334   0.650      0.284   0.614      0.345   0.663      0.424   0.762      0.415   0.745
quora                       0.789   0.973      0.789   0.973      0.834   0.986      0.835   0.987      0.865   0.994      0.889   0.997      0.887   0.996
dbpedia-entity              0.318   0.468      0.313   0.398      0.437   0.562      0.292   0.453      0.413   0.541      0.407   0.530      0.434   0.536
scidocs                     0.149   0.348      0.158   0.356      0.159   0.373      0.149   0.360      0.165   0.378      0.217   0.496      0.203   0.451
fever                       0.651   0.918      0.753   0.931      0.788   0.946      0.682   0.936      0.758   0.949      0.863   0.972      0.890   0.965
climate-fever               0.165   0.425      0.213   0.436      0.230   0.521      0.155   0.442      0.237   0.575      0.312   0.636      0.259   0.581
scifact                     0.679   0.925      0.665   0.908      0.704   0.935      0.649   0.926      0.677   0.947      0.741   0.967      0.718   0.963
                           --------------     --------------     --------------     --------------     --------------     --------------     --------------
avg                         0.424   0.586      0.429   0.576      0.477   0.618      0.353   0.548      0.448   0.613      0.509   0.648      0.521   0.649

cqadupstack-android         0.380   0.683      0.371   0.689      0.390   0.740      0.377   0.744      0.425   0.750      0.508   0.845      0.500   0.832
cqadupstack-english         0.345   0.576      0.332   0.584      0.408   0.695      0.357   0.644      0.433   0.694      0.486   0.759      0.491   0.757
cqadupstack-gaming          0.482   0.765      0.442   0.757      0.496   0.813      0.460   0.809      0.528   0.848      0.597   0.904      0.605   0.900
cqadupstack-gis             0.290   0.612      0.290   0.646      0.315   0.632      0.241   0.579      0.302   0.627      0.413   0.768      0.392   0.744
cqadupstack-mathematica     0.202   0.488      0.205   0.521      0.238   0.580      0.184   0.513      0.235   0.573      0.316   0.692      0.304   0.667
cqadupstack-physics         0.321   0.633      0.325   0.649      0.360   0.720      0.343   0.701      0.416   0.762      0.472   0.808      0.438   0.784
cqadupstack-programmers     0.280   0.559      0.296   0.619      0.340   0.658      0.303   0.640      0.357   0.719      0.424   0.786      0.437   0.789
cqadupstack-stats           0.271   0.534      0.279   0.572      0.299   0.589      0.248   0.527      0.309   0.586      0.373   0.673      0.352   0.643
cqadupstack-tex             0.224   0.469      0.209   0.495      0.253   0.516      0.154   0.433      0.221   0.498      0.311   0.649      0.308   0.624
cqadupstack-unix            0.275   0.542      0.279   0.572      0.317   0.621      0.264   0.588      0.326   0.616      0.422   0.780      0.406   0.754
cqadupstack-webmasters      0.306   0.582      0.301   0.610      0.317   0.636      0.288   0.648      0.339   0.703      0.407   0.777      0.407   0.749
cqadupstack-wordpress       0.248   0.515      0.256   0.553      0.273   0.595      0.191   0.536      0.253   0.577      0.355   0.705      0.343   0.694
                           --------------     --------------     --------------     --------------     --------------     --------------     --------------
avg                         0.302   0.580      0.299   0.606      0.334   0.650      0.284   0.614      0.345   0.663      0.424   0.762      0.415   0.745
thakur-nandan commented 2 months ago

LGTM.