A web app (wordsworth.us) to identify anachronistic words & phrases in historical fiction by comparing it to fiction written during that era. Hackbright Fellowship final project.
Before being made, they need a regex to remove [] brackets because of that book that contains words like "Cr[oe]sus" and "Ph[oe]be"
They also need a regex to convert curly apostrophe to straight apostrophe
Even if that were solved, NLTK is doing something weird with contractions. The phrase they didn't appears in North & South but not in 1850s bigram dict.
Some things I've realized about the bigram dicts:
Before being made, they need a regex to remove [] brackets because of that book that contains words like "Cr[oe]sus" and "Ph[oe]be"
They also need a regex to convert curly apostrophe to straight apostrophe
Even if that were solved, NLTK is doing something weird with contractions. The phrase they didn't appears in North & South but not in 1850s bigram dict.