dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Update gendered parameters from Boolean to something more flexible #102

Closed kenalba closed 3 years ago

kenalba commented 4 years ago

The codebase currently uses Boolean operators (or manually checking for 'male' or 'female' as a dict value) to determine which gender to search for in functions like find_gender_adj. Many of the other functions (e.g. find_female_adj) have binary gender literally encoded.

We're moving instead over to using the Gender object - and more specifically, Gender.identifier - to grab the things we're looking for.

Impacted functions:

@kenalba is working on:

@samimak37 is working on:

@wilke0818 is working on:

We've removed:

Let's discuss potential solutions in this Issue thread.

kenalba commented 4 years ago

A lot of these functions ultimately point to common.MASC_WORDS and common.FEM_WORDS. Maybe we can add a third category - or, better, an interface that allows the user to define other gender categories and input pronouns and names for those other categories?

samimak37 commented 4 years ago

I like the idea of allowing user-specified categories!

I think the common.*_WORDS are fairly limited in their scope (each have only 3-4 words). Maybe we can use them as a fallback in case the user doesn't provide their own pronouns/categories? But in either event we should definitely build them out some more and give more options in general.

kenalba commented 4 years ago

I think going forward with this is a good first step. If functions like find_gender_adj take as a parameter a valid key from a global dict of dicts like this rather than a Boolean:


gender_list {
'female' :  { 'FEM_WORDS' : ['herself', 'hers', 'she', 'her'] },
'male' :  { 'MASC_WORDS' : ['himself', 'his', 'he', 'him'] },
'neutral' :  { 'NEUT_WORDS': ['their', 'theirs', 'them', 'themself'] }
}

(we can maybe omit the nested dicts if we just replace 'FEM_WORDS' with 'female', e.g.)

... and we provide some functions for making user-defined gender_lists, ESPECIALLY if we can do so with names rather than just pronouns, that might be a functional way forward from here that won't take any huge reorganization.

ryaanahmed commented 4 years ago

This general approach seems good to me: have a global data structure with some defaults, and then provide an easy interface for users to edit it. As you've both already said, this provides flexibility to add other words in addition to pronouns, like names, and allows the end user to change the categories.