gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

Extend the lookup library to return the level of confidence of a match #96

Open marcos-lg opened 3 years ago

marcos-lg commented 3 years ago

When there are no matches for a certain value, the lookup library should try to find a fuzzy match by stripping some characters or words from the value.

For example, in a Month vocabulary a lookup by January? wouldn't return any match, but if we strip the ? it would return January.

When doing this, the lookup library would return wether the match was Exact or Fuzzy.

Still to be determined what characters or words should be stripped (it was being discussed to strip things like ? or Perhaps).

If possible it should be something that can be applied to any vocabulary. If it's not possible we can set pre-filters per each vocabulary and if the match is found after applying the prefilter then it's considered Fuzzy.