Kurrawong / fair-ease-matcher

Apache License 2.0
0 stars 0 forks source link

A term from a specific vocab is found but not another one - why? #62

Open gwemon opened 9 months ago

gwemon commented 9 months ago

Selected source: EurOBIS Selected dataset: Weight of Copepoda in the Southern Bight of the North Sea in 1971 and 1974 Link to XML: https://gs-service-production.geodab.eu/gs-service/services/essi/csw?service=CSW&version=2.0.2&request=GetRecordById&id=45AA230327D43CE0EB4A3FB99AA3AB51A3BE5B81&outputschema=http://www.isotc211.org/2005/gmi&elementSetName=full Bug: "Length" from S06 is not found why? "Dry weight biomass" is and "Length" is found in other vocabs all as exact matches but not found from S06 even though it is there. Search is basic search (i.e. unfiltered for Instruments/Platforms/Parameters)

recalcitrantsupplant commented 9 months ago

There is a lot of noise in the full text search specifically for the term "Length":

image

These results rank higher in the search results than just a match of "Length" presumably because they contain the word "Length" many times.

Increasing the number of results per search term will eventually yield the result from S06, for example increasing the limit to 30 (current production/development value is 10) gives the desired result:

Cut down version of query

Solutions: Option 1: Increase the limit (as demonstrated above). As the Exact and Wildcard/Proximity matches are retrieved using the same query, this will:

Option 2: Separate out the queries for Exact and Wildcard Matches, and increase the limit on the Exact Match query only. (They would be almost identical, with the Exact Match query having a FILTER clause where the Wildcard/Proximity query does not).

This will increase the number of queries to the triplestore so the searches will take longer. It should not be more than 2* the current query time and may be less depending on how the limits are set and other factors.