Open reaganrewop opened 5 years ago
Maybe some references for text/passage relevance:
arXiv.orgThis paper studies the performances and behaviors of BERT in ranking tasks. We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc document ranking. Experimental results on MS MARCO demonstrate the strong effectiveness of BERT in question-answering focused passage ranking tasks, as well as the fact that BERT is a strong interaction-based seq2seq matching model. Experimental results on TREC show the gaps between the BERT pre-trained on surrounding contexts and the needs of ad hoc document ranking. Analyses illustrate how BERT allocates its attentions between query-document tokens in its Transformer layers, how it prefers semantic matches between paraphrase tokens, and how that differs with the soft match patterns learned by a click-trained neural ranker.
🏆 SOTA for Passage Re-Ranking on MS MARCO(MRR metric)
Based on the above results, the relevance between Out of domain and Out of the domain sentences gives Inconsistent results even with our Metric similarity (NSP + Cosine) and the model/similarity is not able to differentiate between subtopics.
While the above issues are taken care in the current PIMs approach (because the comparison always involves one In-domain sentence and, ranking within topics hasn't yet been our focused goal), it is not guaranteed to work on other tasks like community detection.