JULIELab / trec-pm

Support code and resources for participation at the TREC Precision Medicine Track (TREC-PM)
http://trec-cds.appspot.com
MIT License
9 stars 2 forks source link

Punish empty papers about genes #75

Closed michelole closed 5 years ago

michelole commented 5 years ago

E.g. https://www.ncbi.nlm.nih.gov/pubmed/16521281

khituras commented 5 years ago

What do you mean with "on genes"? I just tried a bit and found that this works quite well:

{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "title": "RYR1"
        }
      },
      "negative": {
        "bool": {
          "must_not": {
            "exists": {
              "field": "abstract"
            }
          }
        }
      },
      "negative_boost": 0.5
    }
  }
}

One can see very nice how documents without abstracts are pushed down with the negative_boost<1 and how they are pulled up with >1.

michelole commented 5 years ago

"on genes": pseudo-papers like the example that pretend to talk about a specific gene.

So far I was filtering this with must_not "gene symbol" (pending evaluation), but if there's such a clause exists, it should be the way to go.

Not sure if this should be negatively-boosted or a hard requirement though.