dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Prettify some counter and dictionary outputs. #104

Closed kenalba closed 3 years ago

kenalba commented 3 years ago

Some functions, like find_gender_adj, return an unsorted dictionary with no lower bounds and no stop-word detection. Adding parameters to the function which make it return something more user-friendly (a sorted list of (word, count) tuples, maybe?) that cuts off at a given lower-end parameter would increase usability.

kenalba commented 3 years ago

We could also consider standardizing whether these functions return counters or dictionaries - and having a display function that returns something more parsable for the end user.

shingkid commented 3 years ago

Had a look at this issue and needed some clarification:

  1. Why are 3 and 5 specifically used? Perhaps they should be extracted out as constants with names that are more explanatory? https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L41-L42

  2. I don't think it's good practice to return a different type. Currently, the expected output is a dictionary but here a string "error message" is returned. Maybe raise an exception instead and return empty/null? Tiny thing - might want to simplify if not lower_window_bound >= 5: to if lower_window_bound < 5:. https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L46-L47

  3. I might not have understood this part correctly, but it appears that in fact, it is always taken that lower_window_bound=5. Is that the desired behaviour? https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L48-L51

shingkid commented 3 years ago

Oh, just saw that @samimak37 created a related issue #119 that overlaps with some of the points I've raised above!

MBJean commented 3 years ago

Leaving a note on this issue: in the new proximity module (PR #159), I've standardized on using Counters as the return.

MBJean commented 3 years ago

Closing in favor of #170.