MarissaSkud / Wordsworth

A web app (wordsworth.us) to identify anachronistic words & phrases in historical fiction by comparing it to fiction written during that era. Hackbright Fellowship final project.
MIT License
5 stars 0 forks source link

Add a regex to deal with ellipses of three periods #8

Closed MarissaSkud closed 5 years ago

MarissaSkud commented 5 years ago

Regex currently does not take into account ellipses formed by three periods (...) instead of ellipsis character(…). It treats them as 3 individual periods and removes them, and if no space was included before/after the ellipsis, ends up jamming two words together. Hence the existence of words like "sympathyquickness" in the 1900s word set (because the phrase "...full of innate sympathy...quickness to perceive good" is in Room With a View). Need to rewrite regex, test & re-pickle.

MarissaSkud commented 5 years ago

Completed evening of 6/22. Because remove_irrelevant_characters() is also run on user input, this also eliminates the problem if user's text includes a 3-period ellipsis.