dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Add a generic metadata result function #146

Closed samimak37 closed 3 years ago

samimak37 commented 3 years ago

As it stands right now, many of the analysis modules have "author_gender", "location", and "year" functions as a means of grouping the result data together (see these three)

https://github.com/dhmit/gender_analysis/blob/e790107035c536f02e9810fa3b05c1131b9f4de2/gender_analysis/analysis/gender_frequency.py#L429

https://github.com/dhmit/gender_analysis/blob/e790107035c536f02e9810fa3b05c1131b9f4de2/gender_analysis/analysis/gender_frequency.py#L383

https://github.com/dhmit/gender_analysis/blob/e790107035c536f02e9810fa3b05c1131b9f4de2/gender_analysis/analysis/gender_frequency.py#L476

All three of these functions have relatively similar implementations across all of the modules, and the choice of metadata that they specify feels somewhat arbitrary. We should create a general version of these functions where a user can perform these analyses on any metadata field that they create.

MBJean commented 3 years ago

This is resolved for our word-window module (proximity), but not yet for gender_frequency.

MBJean commented 3 years ago

Thanks for identifying this opportunity! PR #169 resolves this for gender_frequency, and PR #168 will resolve it for instance_distance. These will both be in place by next Monday, so I'm closing this issue.