As a registered user, I want to record the pronoun I use so people use the right one when discussing with me

mscherer commented 4 years ago

Close to #105, but being able to show what pronouns people use can be a big deal for inclusivity.

Also https://github.com/fedora-infra/noggin-old/issues/9

From a technical point of view, there isn't anything complex (just store the information), but the real problem is to decide on limitation and format to avoid abuse.

There is a few pitfalls to avoid, and on top of my head:

some people misunderstand the purpose, most likely due to mistranslation and/or lack of awareness. For example, I heard several time the story of austrian/german people using that kind of field to add their title "dr", because that's quite a big deal there. In the same way, if we ever get a lord/lady in UK in the community, maybe they would use that. I would recommand having a title entry to avoid that, but that would be a separate bug.
there is a tension between "being very inclusive" (using a free form field, since people use new pronouns all the time), and "avoiding folks putting jokes there". For example, at RH (where did deploy that on our internal directory since 2 or 3 years), I did some stats and we had 500 to 1000 people using the system at first, and around 20 who didn't really used it in good faith.
some people might not want to use that, some people might want to use more than 1 pronoun. So being able to opt out, and being able to select more than 1 seems important.

based on research, there is 2 way.

The first one is a free form field. This is what we have inside RH, and while that's compact, this requires some moderation. On top of that, since there is no mandated format, it is not trivial to do stats, nor easy to signal people use more than one. For example, folks did use "she/her", "she, her", "she her", but also "she, they". On the plus side, it didn't limit what can be used, which was better for inclusivity.

On the other end of the spectrum, there is what Google did deploy, which is a set of checkbox, so people can choose one or more in a set of 10 different choices. I wasn't able to get more information on that. Based on my stats inside RH, I would say that he/she/they did cover almost everybody, except 2 or 3 persons. This do not requires moderation, but is not compact.

One middle ground would be checkbox for commons (he/she/they) + a free form field.

I also spent some time looking for guidance on the topic, and did found a few ressources, if that help: https://queerlyrepresent.me/resources/articles/gender-accessibility https://uxdesign.cc/designing-forms-for-gender-diversity-and-inclusion-d8194cf1f51 https://www.brynmawr.edu/sites/default/files/asking-for-name-and-pronouns.pdf

Finally, there is also the question on whether this information should be used as a proxy for grammatical gender or not in interface, which do bring the question on how it should be stored (and so used.

While that's tempting, I would suggest "no", given the relative complexity of grammatical gender around the world, and the fact that some languages are more gendered than english, which would bring all kind of interesting problems.

tiran commented 4 years ago

Thanks a lot for bringing this up, Micahel. I'm +1 on the proposal. I've contacted a trusted expert on the matter, too.

My gut feeling tells me that we should start with an optional field that is limited to he/she/they. This has multiple benefits:

an optional field makes migration from old user database possible w/o guessing gender from name (horrible idea).
an optional field lets people opt-out that don't want to disclose personal information.
he/she/they seems to cover most common cases.
list of allowed cases can be extended with a simple update. It's technically more complicated to normalize free form data later.
a limited selection prohibits abuse or mistakes (e.g. the Dr. case)
a limited selection with normalized values also makes i18n easier.

abompard commented 4 years ago

Great discussion. What about having a list of options but allowing multiple selections? Maybe not with the browser's default widget for <select multiple> because it's known to be horrible usability-wise, but you get the idea. With this a user could select she and they, while keeping a controlled amount of options. We would serialize and deserialize the field in IPA's text attribute (with slashes for example).

mscherer commented 4 years ago

I do agree with @tiran gut feeling.

There is also the question of the order in case of multiple selection. I do not know if people have a prefered ordre, but I would be surprised if everybody was perfectly ambivalent on everything. Maybe that's just a theoretical problem, as I do not remember anyone bringing the issue, besides me, and mostly due to edge cases anticipation.

Should we maybe try to gather more input from folks that would be impacted, for example, a article in Fedora magazine, or do we feel that bringing too much attention might be too draining/detrimental ?

tiran commented 4 years ago

It's trivial to extend from single-value to multi-value attribute and to add more choices. It's much harder to go back to single-value attribute or to remove choices. From a technical PoV it makes sense to start with the simplest and most extensible solution that covers the majority of use cases.

LDAP does not retain order of attributes. Ordering of multi-valued entries would require additional logic.

tiran commented 4 years ago

Draft PR https://github.com/fedora-infra/freeipa-fas/pull/97

I'll wait until I got some feedback.

tiran commented 4 years ago

I got feedback. In all responses the person prefers or even strongly recommends free form field + moderation.

LDAP is not designed to perform queries like SELECT DISTINCT pronoun FROM users. There is no good and efficient way to build a moderation tool with LDAP as backend. If you get most users to select a value from a list if preferred values, then it is possible to find outliers efficiently.

I have updated my PR with a list of pronouns. The UI uses an editable combobox that supports custom values. There is also some rudimentary blocking implemented.

mscherer commented 4 years ago

So, I did some stats inside RH this weekend, where we do have a free form field (no validation AFAIK, besides size) stored in a DB, and displayed

Adoption of the feature is around 20% of the company, with she/he usage mapping pretty much to the public diversity stats we published (See https://www.redhat.com/en/jobs/life/diversity , end of the page). Outside of those, the most used pronoun is they, then around 1 or 2 folks for less common neo-pronouns (xe, ze, per).

If we remove the top 4 (she, he, they, and the empty string), we are left with 149 distinct strings.

For example, we have 20 distinct different variations of "he/him/his" in the system. Variations include:

adding spaces or not ("she/her/hers" vs "she / her / hers")
different order ("he/him/his" vs "he/his/him")
different separators ( using "\" instead of /, using ",", "-", ";")
different cases ("He/Him/His" and "He/him/his")
and a mix of all of this.

Even after heavy normalization, I still find new case since I look at that. And the same goes for "she/her/hers".

Then there is all the rest, in random order:

emojis
various unicode string
javascript (nice try, but nope)
folks using multiple pronouns, with or without words to explain which to use ("or", "either"), even complete sentence like "any would work")
folk using that for their nickname, official title (Dr), non official title ( a few lords, majesty, king, dude )
some stuff like "Attack helicopter"
folk adding pronouns and something else ("He / Him / His / Apache Helicopter")
folks adding complete sentence, related or not related to the idea of pronoun
folks using several pronouns "she/they"
folks adding their first name

All with or without adding "/" (or it would be easy to filter)

I can show the corpus to another RH employee interested to look at the type of stuff we have, but I suspect I can't publish it.

But what I get from that is that is that errors (in form, in understanding of the field) and non legitimate use of the field outweigh legitimate use by 1 or 2 order of magnitude.

So based on that, I do think it is quite clear that we should optimize to limit errors, and so:

limit the free form field to 1 word (less variation, the better IMHO)
make sure we do lowercase and normalise as much as possible
show the free form as a option, but really guide people to the most commonly used (so he/she/they)
filter to keep only ascii chars.

Since we decided that we use english as a lingua franca for the project, we can filter a whole lot class of problem (unicode, js) by restricting to ascii. For example, we wouldn't be stuck to decide what to do when we see هي and زب (one is arabic for "she", the other is a vulgar word).

Also, the comment on LDAP make me curious.

@tiran , what kind of moderation do you have in mind ?

While LDAP wouldn't be suitable for that (I guess for performance reason), I do think that at minima, we can fetch the content of the directory once per day and do sorting/filtering outside of LDAP without too much trouble, but that's likely not what you had in mind.

Also, while people prefer to have it moderated and free form, it also assume that people do volunteer for that somehow, and if my data are to be believed, even after normalization and filtering of common problem, there is still a lot more people not using it for pronouns than people using it for pronouns (outside of the big 3). So I think that like spam moderation mailman, in get tedious after a while.

So if we aim for "free form + moderation", we need to answer some questions:

what happen if there is no one who look at it (not enough volunteer, vacation)
do we want moderation before it get published (in which case, how do we ensure that it do get done in a timely manner) ?
or after it get published ?
who decide what is valid and what is not ?

I do understand the desire of free form due to reluctance to limit people (because it likely remind too much of the adherence to a strict binary), but at the same time, the whole idea of moderation is to limit folks, so we would go in opposite directions.

abompard commented 4 years ago

Oh really some people use "Apache Attack Helicopter" as their pronouns? That is not an accident, it's a well-known transphobic meme.

ryanlerch commented 3 years ago

The burden of moderating this field is pretty high (be it on either a software maintainece level, if we implement a filter, or a volunteer level we moderate every pronoun change).

What do you all think of moving this forward as a freeform field without any form of pre-moderation or filtering. Anyone misusing the field could be reported for a code of conduct violation, correct?

StephenCoady commented 3 years ago

I think this is a sensible approach. No point in stopping the feature being there in a first pass, we can add protection/detection later maybe.

fedora-infra / noggin

As a registered user, I want to record the pronoun I use so people use the right one when discussing with me #220