four-d-tesseract / EtymologyMarker

This program looks up the etymologies of words in a text file and color-codes the words according to their origin. It allows a writer to view the register of her writing at a glance.
MIT License
15 stars 2 forks source link

False positives in greek morphemes #6

Open ghost opened 8 years ago

ghost commented 8 years ago

Greek morpheme search occasionally has false positives if it cannot find a match for the word in the dictionary

This may be a rare and excusable example as although its etymological history is middle english, the modern spelling of 'ache' is due to it being mistaken for Greek in origin (akhos - pain).

Proposed solutions:

four-d-tesseract commented 8 years ago

In this case, the program is finding tach, the Greek root for 'fast', inside hear**tach**e. A simpler and more general solution would be to force the Greek roots list to be more specific. When tach is a real Greek root, it's always followed by a "y" or an "o." I can remove tach from the list of matches and add tachy and tacho.

Removing morphemes from Greek dictionary that are too small/common as substrings of other non-greek words to be meaningful.

Already had that problem with oo, the Greek root for egg! I should review the Greek roots list for more.

This is something I'd like to handle myself, which I'll be doing in the next couple of weeks.

ghost commented 8 years ago

Ah yep - that's a much simpler solution!