datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
150 stars 52 forks source link

[new feature] estimate gender participation #249

Closed nllz closed 6 years ago

nllz commented 8 years ago

I imagine this could be a very relevant addition to analyze sender notebook as well as to other notebooks, such as centrality and community, word analysis, and threads.

Perhaps this code could be useful: https://github.com/malev/gender-detector

nllz commented 8 years ago

Relevant post about gender estimation and current practices: https://civic.mit.edu/blog/natematias/best-practices-for-ethical-gender-research-at-very-large-scales

npdoty commented 8 years ago

+1, this would be great. I tend to be skeptical about the results of automated detectors, but if they're based on statistics and the limitations are made clear when the data is used in aggregate, I think it would still be useful.

sbenthall commented 8 years ago

Technically a duplicate ticket, as this is one of the earliest proposed BigBang features. See #13

As this ticket has more discussion on it and has a more recent tech reference, I'll close the earlier ticket.

hargup commented 8 years ago

If some of the participants are famous enough to have wikipedia pages, we might even use wikidata to get their accurate gender.

npdoty commented 8 years ago

http://schedule.bid-seminar.com/speakers/75

I would love to know the methods behind this research on Github (from Google/NC State), which showed that pull requests from women were accepted more often, but only when they weren't identifiable as women.

npdoty commented 7 years ago

I've started on this in this branch: https://github.com/npdoty/bigbang/tree/name-and-gender

As expected, there are issues with automating extracting the first name, and with many names that the library can't confidently estimate the gender. Both of those are going to be more difficult with international audiences, where naming conventions vary, like that of international standard-setting bodies.

sbenthall commented 7 years ago

@npdoty assigning this ticket to you since you've started to work on it.

I think that even a partial solution would be an awesome feature for 0.2, if you think you could prepare it in time.

Do you think you could have a pull request ready for the 0.2 release? If not, no big deal, we can leave this ticket to a later milestone.

npdoty commented 6 years ago

I believe this feature is complete and we can close this issue. There will still be further development in this area, on documentation, refining gender detection, using other means to infer gender like accessing Google+ pages or social media accounts, etc., but I believe this issue just tracks the initial capability, which is demonstrated in #281.