Reduce Weight on Levenshtein Distance

futo-org / android-keyboard

Offical FUTO Keyboard Issue Tracker and Source Mirror of https://gitlab.futo.org/keyboard/latinime

Other

782 stars 24 forks source link

Reduce Weight on Levenshtein Distance #227

Open MiaSelene opened 4 months ago

MiaSelene commented 4 months ago

Since the transformer lexicon is so big, sometimes exact matches are still nonsensical predictions in context, particularly for short words. Sometimes this happens when the words aren't even exact match. I wanna recommend putting more weight on text prediction rather than letter closeness

Example, German language Keyboard Was ISF das für ein Gier?

Where both ISF and Gier are nonsense compared to intended words "ist" and "Tier"

FetchFast commented 4 months ago

Sorry, this doesn't help you, but it might be one day, so I just want to share what I've learned from fine-tuning English.

After about 8 fine-tune runs, and about 2 weeks of training data I started getting the start of multi word correction suggestions in the form of hyphenated words from the LLM.

Thus, if I set a=max, I'd only be using the LLM and addressing your concern.

Unfortunately, German doesn't have access to all this yet because the LLM settings only mention English. I wonder how to train models for these other languages? What's the workflow?

thiswillbeyourgithub commented 3 months ago

Sorry, this doesn't help you, but it might be one day, so I just want to share what I've learned from fine-tuning English.

After about 8 fine-tune runs, and about 2 weeks of training data I started getting the start of multi word correction suggestions in the form of hyphenated words from the LLM.

Thus, if I set a=max, I'd only be using the LLM and addressing your concern.

Unfortunately, German doesn't have access to all this yet because the LLM settings only mention English. I wonder how to train models for these other languages? What's the workflow?

Somewhat related to #380