biocaddie / prototype_issues

Used to report and track bioCADDIE prototype issues
3 stars 5 forks source link

Low precision on query for mouse red blood cell gene expression studies #151

Open ianfore opened 7 years ago

ianfore commented 7 years ago

Entered "mouse red blood cell" as query text. Clicked on "Gene expression" data type facet 18 results Only 7 of these could really be said to be concerned with gene expression in red blood cells (or their progenitor cells). Precision = 38% See attached sheet for judgments on which are false positives and some analysis of why they were matched. mouse red blood cell.xlsx

naturalbeau commented 7 years ago

Actually 38% is not a low precision for web retrieval tasks. In TREC 2014 CDS track, the top ranked results P@10=0.3633. P@10 means the precision for the first 10 return results. Currently, we are trying to improve the performance by using NLP server and exploring different strategies on the benchmark datasets. If better performance can be achieved, we will update it on the DataMed.