clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Fuzzy search/clustering for VLO facet values #26

Open twagoo opened 8 years ago

twagoo commented 8 years ago

Proposal by Jan Odijk (at CAC2016 meeting, see notes):

For [e.g.] organisations, mapping is not enough, also fuzzy search required, too many spelling variation. So we would need to update the organisation vocabulary, but a lot could be achieved also with fuzzy search (string normalisation, editing distance).

Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top. For certain fields (such as organisation), consider applying fuzzy factor by default for free text search - but maybe also facets (in 'value filtering').

twagoo commented 7 years ago

Could be a fun student project: a bot regularly sends pull requests to (a fork of) the VLO-mapping project with suggested value normalisations.