Open sbadithe opened 2 years ago
As a user, I wish NLP Primitives had the ability to handle unicode text.
Currently, Unicode text is not correctly handled by regexes in nlp_primitives.
nlp_primitives
For example, Àbc is not recognized as a title word by TitleWordCount (Abc is).
Àbc
TitleWordCount
Abc
@sbadithe Is it possible to make a pytest fixture and have it be used by all the NL primitives? That way if we add more NL primitives in the future, we can make sure they support unicode.
As a user, I wish NLP Primitives had the ability to handle unicode text.
Currently, Unicode text is not correctly handled by regexes in
nlp_primitives
.For example,
Àbc
is not recognized as a title word byTitleWordCount
(Abc
is).