ai14 / prosammgen

Generate answers with Markov Chains and a CFG to prosamm reflection assignments.
1 stars 1 forks source link

Calculate mispellings in previous reflection document and use that metric to purposefully mispell the new document equally often #17

Open carlthome opened 9 years ago

carlthome commented 9 years ago

i.e. a "Humanizer"

carlthome commented 9 years ago

A good assumption could be that common keyboard misspellings apply (i.e. close keys). There's probably data on line about common misspellings also so a simple synonym lookup with mispelt words could be used.

nandezer commented 9 years ago

I'm aiming to calculate the probabilities of misspelled words per sentences on a text, but I cannot manage to undestand how the the WordNet class works. Any explanation would be very much appreciated. (I'll leave that part in a coment for now, in the branch interface-WritingStyleAnalyzer-code, class WAnalyzerS.java)

carlthome commented 9 years ago

The WordNet class has nothing to do with this issue.

carlthome commented 9 years ago

Some classification of how bad of a misspelling it is might be a good idea. Test!

nandezer commented 9 years ago

The text is humanized (the algorithm on what words to select is a bit poor, but it change several words for the most close one, according to the Jaro–Winkler distance algorithm)

nandezer commented 9 years ago

Now if I can not find a word in the long file of correct + misspelled words it switch a random character of the word for one close character of the keyboard.