capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

Implementing a new neural reranker module #153

Closed Tooba-ts1700550 closed 3 years ago

Tooba-ts1700550 commented 3 years ago

I am trying to implement a simple neural reranker module following the example. I want to ask what are the different Dependency and config options to choose from? Also, if I want to use a tokenizer from the given modules in Capreolus, how can I include it in the new reranker module? Is it possible to share some more examples of implementing a neural reranker using different options ? Thank you.

andrewyates commented 3 years ago

The ConfigOptions are something you define to use in your reranker. They can be named anything you want.

For dependencies, it is normal to use (1) a pytorch or tensorflow trainer (depending on which one you want to implement your reranker in) and (2) the embedtext extractor for non-BERT models (see KNRM for an example of what will be passed to the reranker or one of the bert passage extractors (see PARADE for an example).

I'd guess that you're using a non-BERT reranker, because you asked about tokenizers and BERT rerankers require a specific tokenizer. The tokenizer is a dependency of the extractor, so you can change it or set options under reranker.extractor.tokenizer. For example, reranker.extractor.tokenizer.name=anserini reranker.extractor.tokenizer.stemmer=krovetz