Closed kenalba closed 3 years ago
A lot of these functions ultimately point to common.MASC_WORDS
and common.FEM_WORDS
. Maybe we can add a third category - or, better, an interface that allows the user to define other gender categories and input pronouns and names for those other categories?
I like the idea of allowing user-specified categories!
I think the common.*_WORDS
are fairly limited in their scope (each have only 3-4 words). Maybe we can use them as a fallback in case the user doesn't provide their own pronouns/categories? But in either event we should definitely build them out some more and give more options in general.
I think going forward with this is a good first step. If functions like find_gender_adj
take as a parameter a valid key from a global dict of dicts like this rather than a Boolean:
gender_list {
'female' : { 'FEM_WORDS' : ['herself', 'hers', 'she', 'her'] },
'male' : { 'MASC_WORDS' : ['himself', 'his', 'he', 'him'] },
'neutral' : { 'NEUT_WORDS': ['their', 'theirs', 'them', 'themself'] }
}
(we can maybe omit the nested dicts if we just replace 'FEM_WORDS' with 'female', e.g.)
... and we provide some functions for making user-defined gender_lists, ESPECIALLY if we can do so with names rather than just pronouns, that might be a functional way forward from here that won't take any huge reorganization.
This general approach seems good to me: have a global data structure with some defaults, and then provide an easy interface for users to edit it. As you've both already said, this provides flexibility to add other words in addition to pronouns, like names, and allows the end user to change the categories.
The codebase currently uses Boolean operators (or manually checking for 'male' or 'female' as a dict value) to determine which gender to search for in functions like
find_gender_adj
. Many of the other functions (e.g.find_female_adj
) have binary gender literally encoded.We're moving instead over to using the
Gender
object - and more specifically,Gender.identifier
- to grab the things we're looking for.Impacted functions:
@kenalba is working on:
[x]
gender_analysis.analysis.gender_adjective.find_gender_adj
[x]
gender_analysis.analysis.gender_adjective.find_female_adj
[x]
gender_analysis.analysis.gender_adjective.find_male_adj
[x]
gender_analysis.analysis.gender_adjective.results_by_location
[x]
gender_analysis.analysis.gender_adjective.get_top_adj
[x]
gender_analysis.analysis.gender_frequency.display_gender_freq
[x]
gender_analysis.analysis.gender_frequency.run_gender_freq
[x]
gender_analysis.analysis.gender_frequency.document_pronoun_freq
[x]
gender_analysis.analysis.gender_frequency.subject_vs_object_pronoun_freqs
[x]
gender_analysis.analysis.gender_frequency.subject_pronouns_gender_comparison
[x]
gender_analysis.analysis.gender_frequency.freq_by_author_gender
[x]
gender_analysis.analysis.gender_frequency.bar_subj_obj_freq
[x]
gender_analysis.analysis.gender_frequency.box_gender_pronoun_freq
[x]
gender_analysis.analysis.gender_adjective.results_by_author_gender
(pushed to post-MVP)@samimak37 is working on:
gender_analysis.analysis.instance_distance
gender_analysis.analysis.dependency_parsing
n.b. this one requires modifying the PronounSeries in our gender object.@wilke0818 is working on:
gender_analysis.analysis.metadata_visualizations.plot_gender_breakdown
We've removed:
gender_analysis.corpus.guess_author_genders
Let's discuss potential solutions in this Issue thread.