SFDO-Community-Sprints / faker_person_diverse

Diverse person names for Faker and Snowfakery
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Discussion: Options for good naming schemes #2

Open acrosman opened 2 years ago

acrosman commented 2 years ago

To create a more diverse naming provider we need to carefully think through how to find, curate, and use lists of names. There are several known pitfalls to avoid including (there are certainly more):

There will be no perfect solution and any solution will reflect the biases of the creators. However, we should be able to make progress and understand the biases our data set reflects and why. That will hopefully make it easier to improve further in the future.

prescod commented 2 years ago

Great thoughts!

Perhaps one way, maybe the only way to deal with the intrinsic bias is to be persona/scenario based.

For example: "June" is an SFDO Partner in Chicago demoing an NPSP extension package to a potential customer, "Luíza."

"June" wants her database to consist of names that would be familiar to New York City residents.

By stipulating all of this, we are introducing several biases, but at least they are explicit and give us a North Star to aim from. A problematic alternative is that everyone has a different Persona in mind (probably themself) and the whole thing is entirely subjective. "I recognize that name, I don't recognize THAT name, etc."

I propose that we aren't trying to be fair in the sense that everyone has an equally likely chance of seeing their name in the dataset. We are trying to be "realistic" in a subjective sense so that a minimum number of people are distracted by the biases in the dataset.

acrosman commented 2 years ago

Notes from May '22 Sprint:

Goals:

What fields are we trying to keep diverse?

Ideas for new name lists:

Processes to ensure anonymity from those sources: