Closed cailyoung closed 5 years ago
I am certainly not claiming absolute accuracy. =) I think it would be useful in the results page to display, in small text, "12% declared pronouns" to indicate what percent of each category of accounts have declared pronouns. The genders of the rest are guessed. Would you like to make a PR?
I'm very rusty in Python, but I could attempt a PR. Might take a while 😄
I think fundamentally we might disagree on whether it's appropriate to be grouping results at all, though. I don't believe any automated tool can state with any confidence that an individual's gender is any particular value unless that individual has explicitly stated it. So the counts based on pronoun are resonable, but to included inferred gender based on name is inappropriate in my opinion.
Any PR I make would alter the output table to make this distinction clear. Would that be OK?
Inferring gender based on name is the main function of this tool.
I'm curious what percentage of the people you follow on Twitter, or who follow you, declare their pronouns; it's my experience that very few Twitter users declare their pronouns in their profiles. Since the purpose of the tool is to help men like me follow more women and nonbinary people, I need a way to estimate the gender proportion of the maximum number of my Twitter contacts. I prefer an estimate of a large number of my contacts instead of a definitive count of a small number. Do you agree with that goal?
Given my goal, and how few people on Twitter announce their pronouns, I think the display should emphasize the estimate, rather than emphasize the number of people who announce their pronouns.
Here's what my results look like now:
https://github.com/ajdavis/twitter-gender-distribution/blob/221175cdd42189d091e3373d4b30437bcd753ef4/analyze.py#L107
Hi there. The above section (and others that return values from the lookup table rather than taking a user's expressed gender/pronoun in to consideration) should probably be categorised differently in the results table - perhaps as 'probably male based on name' rather than being included in the 'male' column. Otherwise you are saying that the name to gender mapping you have used is absolutely accurate for all people who share a given name in the list.