castorini / pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
http://pygaggle.ai/
Apache License 2.0
329 stars 97 forks source link

MonoT5 Reranker assigning same score to every document #325

Open Spongeorge opened 1 year ago

Spongeorge commented 1 year ago

For some reason when using the monoT5 reranker class to rerank results from a LuceneSearcher it assigns the same score to every document.

I was able to fix this problem by changing a line in the rescore() method in pygaggle.rerank.transformer.MonoT5 from for doc, score in zip(batch.documents, batch_log_probs): doc.score = score to for i, doc in enumerate(batch.documents): score = batch_log_probs[i] texts[i].score = score

Full method for reference:

def rescore(self, query: Query, texts: List[Text]) -> List[Text]:
        texts = deepcopy(texts)
        batch_input = QueryDocumentBatch(query=query, documents=texts)
        for batch in self.tokenizer.traverse_query_document(batch_input):
            with torch.cuda.amp.autocast(enabled=self.use_amp):
                input_ids = batch.output['input_ids'].to(self.device)
                attn_mask = batch.output['attention_mask'].to(self.device)
                _, batch_scores = greedy_decode(self.model,
                                                input_ids,
                                                length=1,
                                                attention_mask=attn_mask,
                                                return_last_logits=True)

                batch_scores = batch_scores[:, [self.token_false_id, self.token_true_id]]
                batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
                batch_log_probs = batch_scores[:, 1].tolist()
            for doc, score in zip(batch.documents, batch_log_probs):
                doc.score = score

        return texts