fedora-infra / noggin

Self-service user portal for open-source communities to use over FreeIPA.
MIT License
109 stars 58 forks source link

As a registered user, I want to record the pronoun I use so people use the right one when discussing with me #220

Closed mscherer closed 3 years ago

mscherer commented 4 years ago

Close to #105, but being able to show what pronouns people use can be a big deal for inclusivity.

Also https://github.com/fedora-infra/noggin-old/issues/9

From a technical point of view, there isn't anything complex (just store the information), but the real problem is to decide on limitation and format to avoid abuse.

There is a few pitfalls to avoid, and on top of my head:

based on research, there is 2 way.

The first one is a free form field. This is what we have inside RH, and while that's compact, this requires some moderation. On top of that, since there is no mandated format, it is not trivial to do stats, nor easy to signal people use more than one. For example, folks did use "she/her", "she, her", "she her", but also "she, they". On the plus side, it didn't limit what can be used, which was better for inclusivity.

On the other end of the spectrum, there is what Google did deploy, which is a set of checkbox, so people can choose one or more in a set of 10 different choices. I wasn't able to get more information on that. Based on my stats inside RH, I would say that he/she/they did cover almost everybody, except 2 or 3 persons. This do not requires moderation, but is not compact.

One middle ground would be checkbox for commons (he/she/they) + a free form field.

I also spent some time looking for guidance on the topic, and did found a few ressources, if that help: https://queerlyrepresent.me/resources/articles/gender-accessibility https://uxdesign.cc/designing-forms-for-gender-diversity-and-inclusion-d8194cf1f51 https://www.brynmawr.edu/sites/default/files/asking-for-name-and-pronouns.pdf

Finally, there is also the question on whether this information should be used as a proxy for grammatical gender or not in interface, which do bring the question on how it should be stored (and so used.

While that's tempting, I would suggest "no", given the relative complexity of grammatical gender around the world, and the fact that some languages are more gendered than english, which would bring all kind of interesting problems.

tiran commented 4 years ago

Thanks a lot for bringing this up, Micahel. I'm +1 on the proposal. I've contacted a trusted expert on the matter, too.

My gut feeling tells me that we should start with an optional field that is limited to he/she/they. This has multiple benefits:

abompard commented 4 years ago

Great discussion. What about having a list of options but allowing multiple selections? Maybe not with the browser's default widget for <select multiple> because it's known to be horrible usability-wise, but you get the idea. With this a user could select she and they, while keeping a controlled amount of options. We would serialize and deserialize the field in IPA's text attribute (with slashes for example).

mscherer commented 4 years ago

I do agree with @tiran gut feeling.

There is also the question of the order in case of multiple selection. I do not know if people have a prefered ordre, but I would be surprised if everybody was perfectly ambivalent on everything. Maybe that's just a theoretical problem, as I do not remember anyone bringing the issue, besides me, and mostly due to edge cases anticipation.

Should we maybe try to gather more input from folks that would be impacted, for example, a article in Fedora magazine, or do we feel that bringing too much attention might be too draining/detrimental ?

tiran commented 4 years ago

It's trivial to extend from single-value to multi-value attribute and to add more choices. It's much harder to go back to single-value attribute or to remove choices. From a technical PoV it makes sense to start with the simplest and most extensible solution that covers the majority of use cases.

LDAP does not retain order of attributes. Ordering of multi-valued entries would require additional logic.

tiran commented 4 years ago

Draft PR https://github.com/fedora-infra/freeipa-fas/pull/97

I'll wait until I got some feedback.

tiran commented 4 years ago

I got feedback. In all responses the person prefers or even strongly recommends free form field + moderation.

LDAP is not designed to perform queries like SELECT DISTINCT pronoun FROM users. There is no good and efficient way to build a moderation tool with LDAP as backend. If you get most users to select a value from a list if preferred values, then it is possible to find outliers efficiently.

I have updated my PR with a list of pronouns. The UI uses an editable combobox that supports custom values. There is also some rudimentary blocking implemented.

mscherer commented 4 years ago

So, I did some stats inside RH this weekend, where we do have a free form field (no validation AFAIK, besides size) stored in a DB, and displayed

Adoption of the feature is around 20% of the company, with she/he usage mapping pretty much to the public diversity stats we published (See https://www.redhat.com/en/jobs/life/diversity , end of the page). Outside of those, the most used pronoun is they, then around 1 or 2 folks for less common neo-pronouns (xe, ze, per).

If we remove the top 4 (she, he, they, and the empty string), we are left with 149 distinct strings.

For example, we have 20 distinct different variations of "he/him/his" in the system. Variations include:

Even after heavy normalization, I still find new case since I look at that. And the same goes for "she/her/hers".

Then there is all the rest, in random order:

All with or without adding "/" (or it would be easy to filter)

I can show the corpus to another RH employee interested to look at the type of stuff we have, but I suspect I can't publish it.

But what I get from that is that is that errors (in form, in understanding of the field) and non legitimate use of the field outweigh legitimate use by 1 or 2 order of magnitude.

So based on that, I do think it is quite clear that we should optimize to limit errors, and so:

Since we decided that we use english as a lingua franca for the project, we can filter a whole lot class of problem (unicode, js) by restricting to ascii. For example, we wouldn't be stuck to decide what to do when we see هي and زب (one is arabic for "she", the other is a vulgar word).

Also, the comment on LDAP make me curious.

@tiran , what kind of moderation do you have in mind ?

While LDAP wouldn't be suitable for that (I guess for performance reason), I do think that at minima, we can fetch the content of the directory once per day and do sorting/filtering outside of LDAP without too much trouble, but that's likely not what you had in mind.

Also, while people prefer to have it moderated and free form, it also assume that people do volunteer for that somehow, and if my data are to be believed, even after normalization and filtering of common problem, there is still a lot more people not using it for pronouns than people using it for pronouns (outside of the big 3). So I think that like spam moderation mailman, in get tedious after a while.

So if we aim for "free form + moderation", we need to answer some questions:

I do understand the desire of free form due to reluctance to limit people (because it likely remind too much of the adherence to a strict binary), but at the same time, the whole idea of moderation is to limit folks, so we would go in opposite directions.

abompard commented 4 years ago

Oh really some people use "Apache Attack Helicopter" as their pronouns? That is not an accident, it's a well-known transphobic meme.

ryanlerch commented 3 years ago

The burden of moderating this field is pretty high (be it on either a software maintainece level, if we implement a filter, or a volunteer level we moderate every pronoun change).

What do you all think of moving this forward as a freeform field without any form of pre-moderation or filtering. Anyone misusing the field could be reported for a code of conduct violation, correct?

StephenCoady commented 3 years ago

I think this is a sensible approach. No point in stopping the feature being there in a first pass, we can add protection/detection later maybe.