Devographics / surveys

YAML config files for the Devographics surveys
43 stars 8 forks source link

Lack of Diversity in State of JS 2024 Video Creators List #254

Open SachaG opened 15 hours ago

SachaG commented 15 hours ago

Note: this post centers women in the discussion because this is what the feedback I've received was about, but the same issue applies to any minoritized segment of the developer community, whether BIPOC, LGBTQ, people experiencing disabilities, and more

The Issue

The "Which video creators do you follow?" question in the State of JS 2024 survey originally featured a list that included no women creators:

bafkreigwplbusr5xudcpg5fggnpibrjfowtvta42pkfft6lnrlozo77axa

The Root Cause

There are two basic ways to collect data in a survey: through a freeform text field, or through a list of predefined options (which in the State of JS case also supports an optional "other answers…" freeform field). Each of them have their pros and cons:

Predefined Options

Pros:

Cons:

Freeform Textfield

Pros:

Cons:

Old Solution

The solution I originally settled on was to alternate between both formats each year. In 2023, this question used a freeform text field:

Screenshot 2024-11-14 at 9 56 08

I then used the resulting data to populate the set of predefined options for the 2024 edition. This strategy aimed to balance the pros and cons of both formats, by ensuring maximum question participation with predefined options one year, while hopefully "resetting" any bias by switching back to freeform input the following year.

In other words, because respondents did not mention women video creators in 2023, they were then absent from the list I compiled for the 2024 edition.

New Solution

The downside is that without this background knowledge, it can see like this year's list of predefined options is the result of an arbitrary process that doesn't include any consideration for diversity and inclusion.

For that reason I've decided to just permanently keep the "Video Creators" and "People" question as freeform inputs, to avoid giving off this impression and thus alienating the very audience the survey lacks the most.

Survey Goals

I do feel the need to add one more thing: especially lately, being "anti-DEI" in the tech community is becoming worryingly trendy. So I want to be absolutely clear that this is not what's going on here. I firmly believe that inclusivity and representation efforts are important for both moral and practical reasons.

Which is exactly why I believe the survey's goal should be to measure the success of those efforts, and not to try and put a thumb on the scale one way or another. So even though artificially drafting a more diverse list would quiet a lot of the criticisms around this issue, I feel like it would not be conducive to actually measuring what the survey's audience thinks.

(The fact that the survey audience itself might be biased or unrepresentative is a related – and quite valid – concern. But again, masking this issue by tweaking the list would not actually be fixing it either.)

What You Can Do

If you are concerned about this issue, the two best things you can to do are:

  1. Follow more women developers and video creators (see this Bluesky list for example)
  2. Enter their name in the survey

And if you happen to be a woman video creator yourself, I would suggest encouraging your audience to take the survey. I realize this comes across as self-serving coming from the person who runs the survey, but it's the surest way to make a difference in the stats.

What Do You Think?

I'm well aware that this comes at a cost for women/BIPOC/LGBTQ/disabled/etc. content creators, since they end up drowned out by the stats.

For that reason, I am not saying this is a perfect approach. I've gone back and forth over this issue many times in the past, and I probably will again in the future.

For example, maybe you think that using the surveys' platform to promote women creators should take priority over its other goals, and that's certainly understandable as well.

So if you have a suggestion on how to tackle this issue, I would love to hear more!

Recommended Reading

jlengstorf commented 14 hours ago

Keeping it fill-in only is fine, but I think the primary issue here is that it feels like you get this pushback every year, and every year you talk through ideas to improve, and then we still don't see any representation in the survey — and that's really a bummer to see.

So even though artificially drafting a more diverse list would quiet a lot of the criticisms around this issue, I feel like it would not be conducive to actually measuring what the survey's audience thinks.

This feels problematic to me. The survey responses reinforce themselves year over year. If the first year only lists dudes, and the next year uses the previous year's responses to "fairly" determine who goes on the list, that's not a data-driven decision — it's encoded bias.

Fixing that encoded bias requires intentionally curating these lists. For example, some of the folks currently listed don't actually talk about JS that much — perhaps they could be dropped off the State of JS survey to make room for others who do.

SachaG commented 14 hours ago

The survey responses reinforce themselves year over year. If the first year only lists dudes, and the next year uses the previous year's responses to "fairly" determine who goes on the list, that's not a data-driven decision — it's encoded bias.

Yes, which is why as I explained last year used a freeform textfield to "reset" any encoded bias.

Fixing that encoded bias requires intentionally curating these lists.

Maybe I'm totally wrong on this but I've considered it, and I just don't think it's a good solution. No matter who "intentionally curates" the list you'll always be introducing bias of some kind. Shouldn't the goal be to remove bias, not add more of it?

jlengstorf commented 14 hours ago

you can't remove bias. human are biased, full stop. so if you want to improve representation, do that — every creator on the list now has been the benefactor of bias, so to imply that it would be somehow unfair to put someone else on the list is pulling up the ladder behind you

asking an actually representative panel to weigh in on creators to feature on the list is less biased than you making a solo call. and having people on the list doesn't force anyone to select them, but it might encourage people to finish the survey who currently walk away from this survey because they think it's just for a specific bro bubble (which would increase the sample size and make the results less biased)

SachaG commented 14 hours ago

I agree that me making a solo call would be biased, which is why I tried to remove myself from the equation altogether by first basing the list on last year's data with no editorializing; and then removing the list altogether in favor of a freeform input.

I think you've done a great job of outlining the upsides of a more "editorialized" approach. It's just not the one I picked this time.

AdditionAddict commented 13 hours ago

My issue with pure type-in is it'll bias to the top 3 or so content creators - remembering names is hard How about auto-complete with custom type in with fuller diverse list people can scroll - probably too late for this years survey but perhaps for next year? FYI right now Theo is streaming with 257 viewers, Zeu (zeu_dev) with 27 10% but an important 10%, double last year and she wasn't streaming a year and a half ago

SachaG commented 12 hours ago

Auto-complete is a good idea.

erinmikailstaples commented 12 hours ago

Hey @SachaG - I was hesitant to add to the conversation, not wanting to dogpile on. However, this survey and its lack of diversity have come up in multiple feeds of mine today, and it felt important to share my perspective. 🤷‍♀️

Rather than just echoing @jlengstorf's sentiment (which I fully co-sign), I want to share a personal story to highlight the real-world impact that surveys lacking diversity (like this one) have on gender minorities like me.

I am a developer.

I've previously been hesitant to claim this not because I lacked the skills, but because environments like these made me feel like I needed to prove my worth constantly and justify my credentials.

Seeing survey results, all-male panels, and tech conferences dominated by dude bros created this persistent, internal narrative: "You don’t belong here. Your voice isn’t valued."

it adds up.

A few years ago - a male colleague helped me advocate for my worth first hand by sharing his employment contract making $25K more than me.

He highlighted the fact that the differences in our resumes was twofold:

He helped me advocate for not only a pay increase, but also changed how I advocate for myself in this space.

I've had assumptions made that I'm nontechnical, folks explain my own graduate work to me, or been asked if I was an assistant or "social media girl" at conferences, rather than a speaker.

When you release a survey with zero representation from women — and this isn't the first time this has been done — it tells women in tech like me that "there are no women worth including on this list." And when you hear that message enough, it starts to impact how you see yourself and your place in the industry.

We know that:

As I quote you, "If we can make sure this new generation of developers isn't turned away by entrenched biases, this could prove to be a great opportunity to make the industry more diverse."

Great idea, @SachaG - it starts with acknowledging your own biases in the first place.

Inclusion should be a standard part of the community, not something that needs to be constantly fought for or justified.

From a purely practical standpoint, diversity improves outcomes. It's not just my opinion; it's backed by academic study after academic study after academic study.

I can't make you change the survey, nor can I force you to rethink how you approach inclusion.

However, I can tell you that you've (intentionally or not) perpetuated harmful biases in an industry that already has enough of them.

SachaG commented 11 hours ago

@erinmikailstaples first of all, thank you for taking the time to share your experience in such a personal way. It means a lot to me that you'd go out of your way and make an effort like that here.

I just want to highlight something that I think goes to the root of the issue:

Seeing survey results, all-male panels, and tech conferences dominated by dude bros created this persistent, internal narrative: "You don’t belong here. Your voice isn’t valued."

My argument from the start is that these things belong to distinct categories:

If the industry is full of harmful biases as you say (which I agree with), and the survey is as a result full of harmful biases – then isn't that what you would expect from a well-run survey?

As far as I can tell, criticisms of the survey fall into three main camps:

1. online communities are not biased (or not to that degree) and the survey introduces its own extra bias.

All the research I've done so far points to this being false, and that the bias does exist inherent to every online community.

I'm willing to change my mind on this if I see data to the contrary, and it's true for example that YouTube's gender stats are opaque and potentially unreliable.

2. even though online communities are biased, the resulting survey should actively counter that bias in order to promote diversity.

This is what I would consider "editorializing" or "putting my thumb on the scale", and I don't think it's the role of the survey.

I would rather use my platform in other ways, such as asking women for their "pick of the year", to write the survey conclusion, or to be involved in survey design (and you can easily find examples of all three).

3. online communities are biased, but the survey should go beyond them to reach out to under-represented populations.

This for me is the position that holds the most water. Yes, if the source of the data is poisoned, it stands to reason that you would look for a fresher source.

The problem here is a practical one: the internet is vastly better at reaching better than any other medium, since that's what it was designed for.

I've tried reaching out to organizations that support women in tech many times in the past, with almost non-existent success. I can only suppose these organizations are already flooded with inquiries for their time and resources.

Even if these efforts were successful, a single YouTuber with an audience in the millions (itself predominantly male) can make a video about a survey and be the source for literally 10% of its audience – second only to our own mailing list. This means even successful outreach efforts run the risk of not meaningfully impact the resulting dataset.

Add to that the limited resources I have at my disposal (I run the surveys mostly by myself) and you get the current situation.

If you've read this far then I thank you for taking the time to understand the context of my decisions. I'm not saying they're always the right ones – but I can say that they are not motivated by a lack of caring.

SachaG commented 11 hours ago

Should I maybe add a general disclaimer to all surveys along the lines of:

This survey was distributed primarily through online means, and as such is representative of the demographics of the online communities that shared it. These platforms each carry their own inherent biases, and may not be representative of the developer community at large.

?

Edit: this is now live on https://2024.stateofhtml.com/en-US

abbeyperini commented 2 hours ago

A survey which aims towards some kind of objective measure free of bias.

This is an impossible goal. Every survey is biased by its creator.. By changing the way your question is phrased, you are merely changing the type of bias you need to combat.

If you want to move the needle towards having more diverse responses with write in responses, you'll now have to tackle selection bias. To get a representative sample, you'll need to make sure your respondents are diverse. This is another thing that has come to my attention that the survey has not been doing.

And let me be clear, this will still not make your survey unbiased. The science of surveys involves acknowledging the way your survey affects your data. Some of it is bias you're fine with - by being a survey requiring a computer or mobile device, you're limiting your audience to those who have a computer or mobile device. But there are always areas to improve - by having basic accessibility issues, your survey may frustrate screen reader users so they won't be well represented in the data.

But this is not the crux of the issue. By your own admission, the survey has been heavily biased in the past. You say the current list of options has come from respondents, but I've seen lists from respondents with a variety of names that aren't there. Is there a system of ranking responses so that only those with a lot of clicks got put in the survey? Do you still have the Bluesky starter pack up with 1 woman and no Black people? It's those kinds of choices we're asking you to rectify through further action. This is not solvable by changing the phrasing of the question. It is your responsibility to use your platform to promote a diverse list of people because you have spent so many years doing the opposite.

erinmikailstaples commented 1 hour ago

Thank you for the response, @SachaG — and yes, I did read it all :)

Regarding the disclaimer — and i don't have an answer here but more a thought to ponder.

What value is a "JS developer survey" if the survey "may not be representative of the developer community at large."

Why pursue the creation of a survey if the data is invalid and seemingly self-serving to provide you credibility and an audience?

If you're running so many surveys that you can't sustainably validate the data, you should reconsider why you ran the surveys in the first place.

This is much like me when I poll all of my office coworkers, "Should we get pizza for lunch?" and get a resounding "yes." However, the total number of people who work from my home office is but one: myself.

While, yes. As much as I can agree all of my officemates (me, myself, and I) have been happy with my lunch surveys, the data's overall value is questionable and largely only self-serving.