Open chaalic opened 2 years ago
Runtime depends on the number of queries and the number of docs per queries. If you have 8k queries with each 100 queries, the evaluator must encode 800k texts which takes quite some time.
Thank you for your answer. However, I only have one query per document in the evaluation set, so I am not sure I understand the reason behind this. I have one other question please, is there a way to view the loss during the training of the model?
Thank you once again :)
Loss during training is not supported yet. But you could create your own loss class that prints the loss.
One query per document does not make sense for the RerankingEvaluator .
The RerankingEvaluator excepts a query and a list of candidates, e.g. 20 candidates that are related to the query. It will then re-rank these 20 candidates and check at which position the relevant document is.
@chaalic Probably the InformationRetrievalEvaluator might work better for your case?
Hi,
I am currently working on the finetuning of "distiluse-base-multilingual-cased-v1", using MultipleNegativesRankingLoss and RerankingEvaluator, over a dataset of 700k (query, sentence) pairs. I'm currently facing a problem with the evaluator as it takes too much time for an evaluation over approximately 8000 unique evaluation pairs. I am using a gpu for the task. Is this normal behaviour?
Thank you for your help !