dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Do something with names instead of pronouns in texts #56

Closed ryaanahmed closed 5 years ago

ryaanahmed commented 5 years ago

Right now all of our functions that analyze gender within the text (as opposed to with the metadata) do so with pronouns only.

Lots of interesting texts don't really use a lot of pronouns (e.g., the congressional hearings that @meesuekim was working with), but they have everyone named.

This might not be possible before initial release, but we should work on it eventually.

ryaanahmed commented 5 years ago

I think we can do a really basic version of this that might be useful quickly by our initial release.

Instead of doing anything involving guessing gender of names in the document texts, we can have FEMININE_WORDS and MASCULINE_WORDS be exposed as globals or as arguments to analysis functions that look for gendered words, with the defaults set as ['she', 'her', 'hers'] etc. -- that way users can modify these parameters if they know in advance the names that they're looking for.

We can do something more sophisticated for a later release.

ryaanahmed commented 5 years ago

@sophiazhi added MASC_WORDS and FEM_WORDS, which are user-settable. We should now propagate this change into the rest of the codebase, particularly...

ryaanahmed commented 5 years ago

done for now via https://github.com/dhmit/gender_analysis/pull/91