Closed kenalba closed 3 years ago
We could also consider standardizing whether these functions return counters or dictionaries - and having a display function that returns something more parsable for the end user.
Had a look at this issue and needed some clarification:
Why are 3
and 5
specifically used? Perhaps they should be extracted out as constants with names that are more explanatory?
https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L41-L42
I don't think it's good practice to return a different type. Currently, the expected output is a dictionary but here a string "error message" is returned. Maybe raise an exception instead and return empty/null? Tiny thing - might want to simplify if not lower_window_bound >= 5:
to if lower_window_bound < 5:
.
https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L46-L47
I might not have understood this part correctly, but it appears that in fact, it is always taken that lower_window_bound=5
. Is that the desired behaviour?
https://github.com/dhmit/gender_analysis/blob/ee1d41f1201202b9f608de8030c0059f0047d980/gender_analysis/analysis/gender_adjective.py#L48-L51
Oh, just saw that @samimak37 created a related issue #119 that overlaps with some of the points I've raised above!
Leaving a note on this issue: in the new proximity
module (PR #159), I've standardized on using Counter
s as the return.
Closing in favor of #170.
Some functions, like
find_gender_adj
, return an unsorted dictionary with no lower bounds and no stop-word detection. Adding parameters to the function which make it return something more user-friendly (a sorted list of (word, count) tuples, maybe?) that cuts off at a given lower-end parameter would increase usability.