enz / german-wordlist

German wordlist for Tanglet and other wordgames.
Creative Commons Zero v1.0 Universal
22 stars 4 forks source link

Corrected blacklist #8

Closed db2222 closed 2 years ago

db2222 commented 2 years ago

These words exist in the German language. Only case sensitivity is sometimes different. The game Lexica (see https://github.com/lexica/lexica) uses the delta between words and blacklist. Therefore these words are incorrectly excluded.

Can this therefore please be merged to fix it. Thanks :-)

enz commented 2 years ago

The lists are case-sensitive becaue I did not want to make any assumptions how a word game handles upper/lowercase, diacritics or the letter ß. Therefore it makes no sense to do a delta after applying transformations. The delta of the unmodified lists should be zero (which can be checked by the script verify.sh).

For example, Tanglet needs the original word forms because it present Wiktionary links in its solution lists and the transformed word ALLER could both come from the illegal toponym Aller or the legal pronoun aller. There are even cases where it needs to show multiple Wiktionary links, for example Maß and Mass for MASS.

db2222 commented 2 years ago

Thanks for your quick response! Seemingly the logic in Lexica needs fixing. I will therefore close this merge request.

db2222 commented 2 years ago

I just checked the list again. But shouldn't at least Hingaben be correct? :-)

enz commented 2 years ago

According to both Duden Online and German Wiktionary, Hingabe is a singular-only noun. The lowercase hingaben as an inflected form of hingeben is already included in the list words.