This program looks up the etymologies of words in a text file and color-codes the words according to their origin. It allows a writer to view the register of her writing at a glance.
Some hyphenated words refer to 'one word' as referenced in dictionaries
e.g. co-operate, post-colonial, life-style
But some refer to compound words unlikely to be found as a single word in our dictionary.
e.g. phosphate-sugar, ill-defined
In its current state the program doesn't handle the latter type well, as it strips the word of punctuation (i.e. turns ill-defined into illdefined) and then looks for a match in the dictionary.
I'm thinking the solution to this would be to:
keep a record of words with hyphens
match first as the program currently does on stripped punctuation
match secondly as it currently does on removing affixes
match thirdly on presence of greek morphemes
fourth make a match on the hypehnated components separately if no match is found in the above 3 methods
I think this is a great idea. I had an idea for something similar (if taking the hyphen out fails, try splitting the word instead), but wasn't sure how to keep track of the words with hyphens. Go for it!
Some hyphenated words refer to 'one word' as referenced in dictionaries
But some refer to compound words unlikely to be found as a single word in our dictionary.
In its current state the program doesn't handle the latter type well, as it strips the word of punctuation (i.e. turns
ill-defined
intoilldefined
) and then looks for a match in the dictionary.I'm thinking the solution to this would be to: