Closed nllz closed 6 years ago
Relevant post about gender estimation and current practices: https://civic.mit.edu/blog/natematias/best-practices-for-ethical-gender-research-at-very-large-scales
+1, this would be great. I tend to be skeptical about the results of automated detectors, but if they're based on statistics and the limitations are made clear when the data is used in aggregate, I think it would still be useful.
Technically a duplicate ticket, as this is one of the earliest proposed BigBang features. See #13
As this ticket has more discussion on it and has a more recent tech reference, I'll close the earlier ticket.
If some of the participants are famous enough to have wikipedia pages, we might even use wikidata to get their accurate gender.
http://schedule.bid-seminar.com/speakers/75
I would love to know the methods behind this research on Github (from Google/NC State), which showed that pull requests from women were accepted more often, but only when they weren't identifiable as women.
I've started on this in this branch: https://github.com/npdoty/bigbang/tree/name-and-gender
As expected, there are issues with automating extracting the first name, and with many names that the library can't confidently estimate the gender. Both of those are going to be more difficult with international audiences, where naming conventions vary, like that of international standard-setting bodies.
@npdoty assigning this ticket to you since you've started to work on it.
I think that even a partial solution would be an awesome feature for 0.2, if you think you could prepare it in time.
Do you think you could have a pull request ready for the 0.2 release? If not, no big deal, we can leave this ticket to a later milestone.
I believe this feature is complete and we can close this issue. There will still be further development in this area, on documentation, refining gender detection, using other means to infer gender like accessing Google+ pages or social media accounts, etc., but I believe this issue just tracks the initial capability, which is demonstrated in #281.
I imagine this could be a very relevant addition to analyze sender notebook as well as to other notebooks, such as centrality and community, word analysis, and threads.
Perhaps this code could be useful: https://github.com/malev/gender-detector