fnielsen / ordia

Wikidata lexemes presentations
https://ordia.toolforge.org
Apache License 2.0
24 stars 13 forks source link

"kø" in text-to-lexemes extracts only "k" #56

Closed fnielsen closed 3 years ago

fnielsen commented 4 years ago

"kø" in text-to-lexemes extracts only "k"

jhsoby commented 3 years ago

This is fixed already, isn't it? Works for me at least.

fnielsen commented 3 years ago

No, I still get a problem: https://ordia.toolforge.org/text-to-lexemes?text-language=da&text=k%C3%B8

jhsoby commented 3 years ago

Aha. It only happens for that mode (lowercase first sentence letters), and it happens for all two-letter words, so it's not a Unicode issue. I looked at the code, and I think the culprit is the 2 in this line: https://github.com/fnielsen/ordia/blob/90b1b91344e42bf0e44d949fddd471d131e38df3/ordia/text.py#L45

fnielsen commented 3 years ago

Now running https://ordia.toolforge.org/text-to-lexemes?text-language=da&text=k%C3%B8 Thanks for the PR.