Closed ali-abz closed 3 years ago
No worries, we're happy to help with questions like this. The preprocess
method is being called from the rerank task here: https://github.com/capreolus-ir/capreolus/blob/master/capreolus/task/rerank.py#L58 (this is also where topics
is passed in)
I'm not sure why preprocess
wasn't called for your custom extractor though. Could it be that a different extractor was running? My first thought is to double check that reranker.extractor.name=YourCustomOne
was set somewhere, because it may be defaulting to a different extractor.
I see, thanks a lot.
That explains why preprocess
was not called since I don't have a re-ranker yet and was testing it by instantiating an object and not using the pipeline.
I have to say, Capreolus is very well designed and written. Thanks for such a great tool.
Hi there, I hope I'm not bothering you guys/gals with my novice questions. I am trying to create a Bert-based re-ranking model and I cannot understand how does
preprocess
method in Extractor module works. The documentation says thatid2vec
needs to be provided for an extractor. I investigatedtextbert
andbertpassage
extractors andid2vec
depends on some sort of dictionary likeself.docid2toks
that are created by methods likepreprocess
,_build_vocab
and such.This part is a bit magical to me since I can not understand what class/module is calling this
preprocess
and what arguments exactly does the caller provide. I did a bit of testing for my extractor andpreprocess
was not invoked.Also, for creating those dictionaries,
self.index.get_doc
andtopics
are used. I understand thatself.index.get_doc
can be provided via dependencies but I don't understand who is providingtopics
for us! I testedself.benchmark.topics['title'].get()
instead and it works just fine, but using justtopics
is really neat. I would appreciate any comments, thanks.