Distinguish between overt pronoun / gender identification by a Twitter user and inferred gender in results

ajdavis / proporti.onl

Compare number of women, men, and nonbinary people among my friends and followers.

https://www.proporti.onl

Apache License 2.0

239 stars 28 forks source link

Distinguish between overt pronoun / gender identification by a Twitter user and inferred gender in results #22

Closed cailyoung closed 5 years ago

cailyoung commented 6 years ago

https://github.com/ajdavis/twitter-gender-distribution/blob/221175cdd42189d091e3373d4b30437bcd753ef4/analyze.py#L107

                g = detector.get_gender(name, country)
                if g != 'andy':
                    # Not androgynous.
                    break

Hi there. The above section (and others that return values from the lookup table rather than taking a user's expressed gender/pronoun in to consideration) should probably be categorised differently in the results table - perhaps as 'probably male based on name' rather than being included in the 'male' column. Otherwise you are saying that the name to gender mapping you have used is absolutely accurate for all people who share a given name in the list.

ajdavis commented 6 years ago

I am certainly not claiming absolute accuracy. =) I think it would be useful in the results page to display, in small text, "12% declared pronouns" to indicate what percent of each category of accounts have declared pronouns. The genders of the rest are guessed. Would you like to make a PR?

cailyoung commented 6 years ago

I'm very rusty in Python, but I could attempt a PR. Might take a while 😄

I think fundamentally we might disagree on whether it's appropriate to be grouping results at all, though. I don't believe any automated tool can state with any confidence that an individual's gender is any particular value unless that individual has explicitly stated it. So the counts based on pronoun are resonable, but to included inferred gender based on name is inappropriate in my opinion.

Any PR I make would alter the output table to make this distinction clear. Would that be OK?

ajdavis commented 6 years ago

Inferring gender based on name is the main function of this tool.

I'm curious what percentage of the people you follow on Twitter, or who follow you, declare their pronouns; it's my experience that very few Twitter users declare their pronouns in their profiles. Since the purpose of the tool is to help men like me follow more women and nonbinary people, I need a way to estimate the gender proportion of the maximum number of my Twitter contacts. I prefer an estimate of a large number of my contacts instead of a definitive count of a small number. Do you agree with that goal?

Given my goal, and how few people on Twitter announce their pronouns, I think the display should emphasize the estimate, rather than emphasize the number of people who announce their pronouns.

ajdavis commented 5 years ago

Here's what my results look like now: