NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.82k stars 898 forks source link

How to prepare triple or negative doc for doc ranking problem? #778

Closed guotong1988 closed 4 years ago

guotong1988 commented 4 years ago

A triple in doc ranking problem are (query, positive_doc, negative_doc)

If I already have query-positive_doc pairs data, how do I prepare the negative doc data?

Random is the baseline policy.

With human in loop, I can use BM25 for the candidate docs and then label each of them.

I prefer do it without human.

Thank you very much.

faneshion commented 4 years ago

Random is the usual case though it will produce weak negative docs. If you already have the positive docs, you can use existing models to retrieve the docs not in the positive set as the negative doc.