Closed gvalchca closed 2 years ago
Hi @gvalchca
This is current a limitation. It does not handle well enough tokens with special characters like -
nor digits.
In this case I would recommend normalising all COVID-19 mentions to simply COVID and it will work just fine.
Ideally we should improve the algorithm to manage this better. If you have ideas, please send us a PR :)
Related issue and explanation by @rncampos here
Hi @gvalchca
This is not exposed by the API but you could play with DataCore
's tagsToDiscard
parameter. By default it ignores digits.
Further explanation can be found here
Hey, thanks for your answer and sorry to have duplicated the thread. However, the solution would not work for me cause in biology/medicine there are plenty of those abbreviations with meaningful numbers (e.g. IL2, IL6).
Hi, Why would YAKE not return the COVID-19 in any of the keywords in the following example:
occupational stress and mental health among anesthetists during the COVID-19 pandemic.
with default parameters, the output looks like this: