@lintool #1907
I tried to encode BEIR & Mr.TyDi/ MIRACL datasets using Contriever or mContriever models for indexing. But pyserini.encode throws the following error
__main__.py encoder: error: argument --encoder-class: invalid choice: 'contriever' (choose from 'dpr', 'bpr', 'tct_colbert', 'ance', 'sentence-transformers', 'auto')
Then I used auto for the encoder-class, but my reproduced performance scores are much worse. I found out that even though contriever uses AutoDocumentEncoder class, in here contriever uses a different pooling operation compared to auto class.
Once I clone and modified the code by including a contriever option for the encoder-class, everything works fine and I was able to reproduce the scores.
@lintool #1907 I tried to encode BEIR & Mr.TyDi/ MIRACL datasets using Contriever or mContriever models for indexing. But pyserini.encode throws the following error
__main__.py encoder: error: argument --encoder-class: invalid choice: 'contriever' (choose from 'dpr', 'bpr', 'tct_colbert', 'ance', 'sentence-transformers', 'auto')
Then I used
auto
for the encoder-class, but my reproduced performance scores are much worse. I found out that even though contriever usesAutoDocumentEncoder
class, in here contriever uses a different pooling operation compared toauto
class.Once I clone and modified the code by including a
contriever
option for the encoder-class, everything works fine and I was able to reproduce the scores.I can make a pull request soon for this issue.