Closed fnielsen closed 3 years ago
This is fixed already, isn't it? Works for me at least.
No, I still get a problem: https://ordia.toolforge.org/text-to-lexemes?text-language=da&text=k%C3%B8
Aha. It only happens for that mode (lowercase first sentence letters), and it happens for all two-letter words, so it's not a Unicode issue. I looked at the code, and I think the culprit is the 2 in this line: https://github.com/fnielsen/ordia/blob/90b1b91344e42bf0e44d949fddd471d131e38df3/ordia/text.py#L45
Now running https://ordia.toolforge.org/text-to-lexemes?text-language=da&text=k%C3%B8 Thanks for the PR.
"kø" in text-to-lexemes extracts only "k"