backspace / slack-statsbot

A Slack bot to track statistics about who is talking the most
80 stars 12 forks source link

Implicit denominator in summary messages is misleading #11

Closed andrewwatterson closed 8 years ago

andrewwatterson commented 8 years ago

Current wording: "Since 1 hour ago, self-identified not-men sent 30% of messages <histogram incl. men, not-men, complicated, and unknown>"

I think it's a safe assumption to say/on-mission that people using this product are much more interested in gender(/racial) identity than willingness to self-report, so I interpreted that message as saying that men sent the remaining 70% of the messages. It's not technically misleading since it mentions the "self-identified" bit, but I think it would be more honest to report the gendered-ness of the messages only as a percentage of the message that you have data on. Put another way, the current statement implies that not only men, but men AND people who won't self-identify are at risk for dominating conversations (rather than being dominated).

I'd propose something like:

"Of people who self-identified, not-men sent 30% of messages in the last hour. "

Another suggestion during a discussion about this was:

"Of messages from people who reported gender, X% came from not-men. Y% did not report. [graph]."

backspace commented 8 years ago

As we discussed in the channel at the time, the problem with removing unknowns from the calculation is that it causes the reported percentage with the marginalised group to go up. It does seem to me that people who won’t self-identify are more likely to dominate, because those people are more likely to think self-identification is unnecessary or that the whole exercise is foolish. I’ve recently added the ability to return to unknown status on characteristics, and I don’t think it’s a coïncidence that this feature has only been requested by people who would not self-identify as marginalised in the attributes. With your suggestion, you could inflate the apparent numbers just by choosing not to identify or erasing your identification.

The full numbers are always available in the verbose report. It didn’t seem to me that there was any consensus in the discussion that this is a desirable change, though it does come up every so often by people who seem offended at what they perceive as statistical imprecision.