Describe the bug
For documents containing the string 'hci' (from human-computer interaction) in the corpus, Bag of Words changes 'hci' to 'hcus' in its sparse-matrix representation
To Reproduce
See attached workflow. Dataset is shared through Google Drive link
Expected behavior
"hci" should be kept as 'hci' in sparse data. Could this be some automatic conversion of latin plurals ending with '-i' to singular ending with '-us' (such as nuclei -> nucleus) caused by the lemmatizer?
Orange version:
3.36.2.
Text add-on version:
1.15.0
Screenshots
If applicable, add screenshots to help explain your problem.
Describe the bug For documents containing the string 'hci' (from human-computer interaction) in the corpus, Bag of Words changes 'hci' to 'hcus' in its sparse-matrix representation
To Reproduce See attached workflow. Dataset is shared through Google Drive link
Expected behavior "hci" should be kept as 'hci' in sparse data. Could this be some automatic conversion of latin plurals ending with '-i' to singular ending with '-us' (such as nuclei -> nucleus) caused by the lemmatizer?
Orange version: 3.36.2.
Text add-on version: 1.15.0 Screenshots If applicable, add screenshots to help explain your problem.
Operating system: Mac OS 14.3.1
Example workflow hcus bug.ows.zip