carpentries / glosario-r

glosario create and retrieve multilingual glossaries.
https://carpentries.github.io/glosario-r
Other
6 stars 5 forks source link

`define()` doesn't implement fuzzymatching #1

Closed ian-flores closed 4 years ago

ian-flores commented 4 years ago

We need to implement fuzzy matching or string distance to search for the nearest word similar to the slug if there isn't an exact match.

Right now if we run:

g <- get_glossary()
define("data frame", glossary = g)

We get:

> Warning: Some key are not found: 'data frame'. They are being excluded.

Because it is expecting the data_frame slug and not data frame. But the define function should be able to see that this is a very near match and thus we should present this definition. I do this using cosine similarity in the Python version

fmichonneau commented 4 years ago

that sounds like a good idea Ian! We should have an extra argument to only allow for exact matches though so we could detect typos when used in combination with the Rmarkdown yaml term list that Greg envisioned.

ian-flores commented 4 years ago

I'll go ahead, implement it and send a PR. I used cosine similarity because it was the one that seemed to work best when adding punctuation signs or weird typos, but I'm open to change the metric as well.