datadryad / hive-mrc

Helping Interdisciplinary Vocabulary Engineering (HIVE)
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

matching algorithm for concept browser #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The concept browser consistently rates exact matches lower than partial 
matches. Review the algorithm that performs matching, and adjust it

Original issue reported on code.google.com by rscherle on 17 Feb 2011 at 6:57

GoogleCodeExporter commented 9 years ago
Had the chance to look at this briefly in the context of the plural/singular 
problem:

HIVE is relying on Lucene for indexing and search. Lucene is indexing both the 
preferred and alternate labels. "Noxious mammals" includes the alternate labels 
"Harmful mammals" and "Injurious mammals", which appear to be increasing its 
score.

Test case:
1. Select vocabulary: Agrovoc
2. Search term: "mammals"
3. Results: 
  Noxious mammals (9.35)
  Mammals (6.64)
  Game mammals (4.15)
  Aquatic mammals (4.15)

Expected results:
  Mammals should appear first. We need to consider what to expect for the remaining results (perhaps factor in relationships -- is narrower/broader than exact match; whether phrase is in altLabel, etc).

Original comment by craig.wi...@unc.edu on 18 Apr 2011 at 5:24