alexandrainst / danlp

DaNLP is a repository for Natural Language Processing resources for the Danish Language.
BSD 3-Clause "New" or "Revised" License
199 stars 34 forks source link

Wrong encoding in unimorph #158

Closed fnielsen closed 3 years ago

fnielsen commented 3 years ago

Describe the bug There is wrong encoding in the unimorph database

To Reproduce

from danlp.datasets import DaUnimorph
unimorph = DaUnimorph()
database = unimorph.load_with_pandas()
>>> 'nytårsaften' in set(database.lemma)
False

I would have expected True here.

Screenshots

Your Environment