dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Subject/object frequencies deletes text of last document in corpus #98

Closed samimak37 closed 4 years ago

samimak37 commented 4 years ago

Thesubject_pronouns_gender_comparison and subject_vs_object_pronoun_freqs functions in gender_frequency.py appear to eliminate the text of the last document in the corpus (along with the word counters) and then not do anything with it.

We should look into this and see if there's any particular reason, or if this is an artifact that just never got removed.

https://github.com/dhmit/gender_analysis/blob/ea1a155c0d0bc79c2d1edb6a65035f97d1d6fc3b/gender_analysis/analysis/gender_frequency.py#L298-L299

https://github.com/dhmit/gender_analysis/blob/ea1a155c0d0bc79c2d1edb6a65035f97d1d6fc3b/gender_analysis/analysis/gender_frequency.py#L356-L357

kenalba commented 4 years ago

After discussion, we agreed that this was probably a vestigial couple of lines, so I've cut them! I submitted a pull request that fixes this here. I guess we close the issue after the request gets accepted?