dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Added Smart Quote Cleaner #117

Closed samimak37 closed 4 years ago

samimak37 commented 4 years ago

This PR fixes #108, an issue where documents that included smart quotes were not parsed correctly, as NLTK's tokenizer doesn't support them. The fix is to replace all smart quotes with their "normal" ASCII variants, in order to play nice with the rest of the package.

Additionally, it appears that a lot of our tests were based on the excluded smart quotes, which have now been fixed such that they use the correct values.

ryaanahmed commented 4 years ago

lgtm -- @samimak37 could you just update from master and push again? Let's see if the CI changes from #110 + my config changes to the repo are working.

samimak37 commented 4 years ago

Wooo! Looks like it works!

ryaanahmed commented 4 years ago

awesome. all good! merging.