castorini / pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
Apache License 2.0
329 stars 97 forks source link

MonoT5 Reranker assigning same score to every document #325

Open Spongeorge opened 1 year ago

Spongeorge commented 1 year ago

For some reason when using the monoT5 reranker class to rerank results from a LuceneSearcher it assigns the same score to every document.

I was able to fix this problem by changing a line in the rescore() method in pygaggle.rerank.transformer.MonoT5 from for doc, score in zip(batch.documents, batch_log_probs): doc.score = score to for i, doc in enumerate(batch.documents): score = batch_log_probs[i] texts[i].score = score

Full method for reference:

def rescore(self, query: Query, texts: List[Text]) -> List[Text]:
        texts = deepcopy(texts)
        batch_input = QueryDocumentBatch(query=query, documents=texts)
        for batch in self.tokenizer.traverse_query_document(batch_input):
            with torch.cuda.amp.autocast(enabled=self.use_amp):
                input_ids = batch.output['input_ids'].to(self.device)
                attn_mask = batch.output['attention_mask'].to(self.device)
                _, batch_scores = greedy_decode(self.model,

                batch_scores = batch_scores[:, [self.token_false_id, self.token_true_id]]
                batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
                batch_log_probs = batch_scores[:, 1].tolist()
            for doc, score in zip(batch.documents, batch_log_probs):
                doc.score = score

        return texts