De7vID / klingon-assistant-data

Klingon language data for {tlhIngan Hol boQwI'} and related apps.
https://play.google.com/store/apps/details?id=org.tlhInganHol.android.klingonassistant
Apache License 2.0
21 stars 17 forks source link

generate a list of all valid words #665

Open De7vID opened 2 years ago

De7vID commented 2 years ago

Somebody made a request to generate a list of all valid words from the database. Basically, they want a list of all words (which would exclude things like sentences) with all possible affixes. The purpose of the list is validation (matching a user-entered input against the list).

I've explained to them that the existing code already does validation but apparently their application requires the explicit list. So if anybody feels like writing a script that outputs this, I can put you in touch with them.

De7vID commented 2 years ago

After thinking about this some more, it may actually be infeasible. I did a rough calculation of the number of valid words in Klingon, and it's in the order of hundreds of billions. I made some simplifying assumptions but I don't think the number is too far off.

CleverLemming1337 commented 1 month ago

Basically you could use something like this:


n1l = "'a' Hom oy".split()
n2l = "pu' Du' mey".split()

nouns = "gho'lIv".split()

words = []

for n in nouns:
    for n1 in n1l:
        for n2 in n2l:
            words.append(n+n1+n2)

print(*words, sep="\n")
De7vID commented 1 month ago

Yes, but the issue is that in order for the list to be exhaustive, it must contain hundreds of billions of entries, which is an extremely inefficient way to do validation when the rules are so regular.

CleverLemming1337 commented 1 month ago

Maybe one could use a script that doesn't generate all words for validation but just checks the affixes.

De7vID commented 1 month ago

I've already explained that to them. They insist that their application can only do validation against a list. Presumably they have code which was designed around a language for which it is possible to validate against a dictionary (e.g., English), and they're just trying to get it to work with Klingon with no modification to the code if possible. ¯_(ツ)_/¯