This PR fixes #108, an issue where documents that included smart quotes were not parsed correctly, as NLTK's tokenizer doesn't support them. The fix is to replace all smart quotes with their "normal" ASCII variants, in order to play nice with the rest of the package.
Additionally, it appears that a lot of our tests were based on the excluded smart quotes, which have now been fixed such that they use the correct values.
lgtm -- @samimak37 could you just update from master and push again? Let's see if the CI changes from #110 + my config changes to the repo are working.
This PR fixes #108, an issue where documents that included smart quotes were not parsed correctly, as NLTK's tokenizer doesn't support them. The fix is to replace all smart quotes with their "normal" ASCII variants, in order to play nice with the rest of the package.
Additionally, it appears that a lot of our tests were based on the excluded smart quotes, which have now been fixed such that they use the correct values.