Open SachaG opened 1 week ago
Keeping it fill-in only is fine, but I think the primary issue here is that it feels like you get this pushback every year, and every year you talk through ideas to improve, and then we still don't see any representation in the survey — and that's really a bummer to see.
So even though artificially drafting a more diverse list would quiet a lot of the criticisms around this issue, I feel like it would not be conducive to actually measuring what the survey's audience thinks.
This feels problematic to me. The survey responses reinforce themselves year over year. If the first year only lists dudes, and the next year uses the previous year's responses to "fairly" determine who goes on the list, that's not a data-driven decision — it's encoded bias.
Fixing that encoded bias requires intentionally curating these lists. For example, some of the folks currently listed don't actually talk about JS that much — perhaps they could be dropped off the State of JS survey to make room for others who do.
The survey responses reinforce themselves year over year. If the first year only lists dudes, and the next year uses the previous year's responses to "fairly" determine who goes on the list, that's not a data-driven decision — it's encoded bias.
Yes, which is why as I explained last year used a freeform textfield to "reset" any encoded bias.
Fixing that encoded bias requires intentionally curating these lists.
Maybe I'm totally wrong on this but I've considered it, and I just don't think it's a good solution. No matter who "intentionally curates" the list you'll always be introducing bias of some kind. Shouldn't the goal be to remove bias, not add more of it?
you can't remove bias. human are biased, full stop. so if you want to improve representation, do that — every creator on the list now has been the benefactor of bias, so to imply that it would be somehow unfair to put someone else on the list is pulling up the ladder behind you
asking an actually representative panel to weigh in on creators to feature on the list is less biased than you making a solo call. and having people on the list doesn't force anyone to select them, but it might encourage people to finish the survey who currently walk away from this survey because they think it's just for a specific bro bubble (which would increase the sample size and make the results less biased)
I agree that me making a solo call would be biased, which is why I tried to remove myself from the equation altogether by first basing the list on last year's data with no editorializing; and then removing the list altogether in favor of a freeform input.
I think you've done a great job of outlining the upsides of a more "editorialized" approach. It's just not the one I picked this time.
My issue with pure type-in is it'll bias to the top 3 or so content creators - remembering names is hard How about auto-complete with custom type in with fuller diverse list people can scroll - probably too late for this years survey but perhaps for next year? FYI right now Theo is streaming with 257 viewers, Zeu (zeu_dev) with 27 10% but an important 10%, double last year and she wasn't streaming a year and a half ago
Auto-complete is a good idea.
Hey @SachaG - I was hesitant to add to the conversation, not wanting to dogpile on. However, this survey and its lack of diversity have come up in multiple feeds of mine today, and it felt important to share my perspective. 🤷♀️
Rather than just echoing @jlengstorf's sentiment (which I fully co-sign), I want to share a personal story to highlight the real-world impact that surveys lacking diversity (like this one) have on gender minorities like me.
I am a developer.
I've previously been hesitant to claim this not because I lacked the skills, but because environments like these made me feel like I needed to prove my worth constantly and justify my credentials.
Seeing survey results, all-male panels, and tech conferences dominated by dude bros created this persistent, internal narrative: "You don’t belong here. Your voice isn’t valued."
it adds up.
A few years ago - a male colleague helped me advocate for my worth first hand by sharing his employment contract making $25K more than me.
He highlighted the fact that the differences in our resumes was twofold:
He helped me advocate for not only a pay increase, but also changed how I advocate for myself in this space.
I've had assumptions made that I'm nontechnical, folks explain my own graduate work to me, or been asked if I was an assistant or "social media girl" at conferences, rather than a speaker.
When you release a survey with zero representation from women — and this isn't the first time this has been done — it tells women in tech like me that "there are no women worth including on this list." And when you hear that message enough, it starts to impact how you see yourself and your place in the industry.
We know that:
As I quote you, "If we can make sure this new generation of developers isn't turned away by entrenched biases, this could prove to be a great opportunity to make the industry more diverse."
Great idea, @SachaG - it starts with acknowledging your own biases in the first place.
Inclusion should be a standard part of the community, not something that needs to be constantly fought for or justified.
From a purely practical standpoint, diversity improves outcomes. It's not just my opinion; it's backed by academic study after academic study after academic study.
I can't make you change the survey, nor can I force you to rethink how you approach inclusion.
However, I can tell you that you've (intentionally or not) perpetuated harmful biases in an industry that already has enough of them.
@erinmikailstaples first of all, thank you for taking the time to share your experience in such a personal way. It means a lot to me that you'd go out of your way and make an effort like that here.
I just want to highlight something that I think goes to the root of the issue:
Seeing survey results, all-male panels, and tech conferences dominated by dude bros created this persistent, internal narrative: "You don’t belong here. Your voice isn’t valued."
My argument from the start is that these things belong to distinct categories:
If the industry is full of harmful biases as you say (which I agree with), and the survey is as a result full of harmful biases – then isn't that what you would expect from a well-run survey?
As far as I can tell, criticisms of the survey fall into three main camps:
All the research I've done so far points to this being false, and that the bias does exist inherent to every online community.
I'm willing to change my mind on this if I see data to the contrary, and it's true for example that YouTube's gender stats are opaque and potentially unreliable.
This is what I would consider "editorializing" or "putting my thumb on the scale", and I don't think it's the role of the survey.
I would rather use my platform in other ways, such as asking women for their "pick of the year", to write the survey conclusion, or to be involved in survey design (and you can easily find examples of all three).
This for me is the position that holds the most water. Yes, if the source of the data is poisoned, it stands to reason that you would look for a fresher source.
The problem here is a practical one: the internet is vastly better at reaching better than any other medium, since that's what it was designed for.
I've tried reaching out to organizations that support women in tech many times in the past, with almost non-existent success. I can only suppose these organizations are already flooded with inquiries for their time and resources.
Even if these efforts were successful, a single YouTuber with an audience in the millions (itself predominantly male) can make a video about a survey and be the source for literally 10% of its audience – second only to our own mailing list. This means even successful outreach efforts run the risk of not meaningfully impact the resulting dataset.
Add to that the limited resources I have at my disposal (I run the surveys mostly by myself) and you get the current situation.
If you've read this far then I thank you for taking the time to understand the context of my decisions. I'm not saying they're always the right ones – but I can say that they are not motivated by a lack of caring.
Should I maybe add a general disclaimer to all surveys along the lines of:
This survey was distributed primarily through online means, and as such is representative of the demographics of the online communities that shared it. These platforms each carry their own inherent biases, and may not be representative of the developer community at large.
?
Edit: this is now live on https://2024.stateofhtml.com/en-US
A survey which aims towards some kind of objective measure free of bias.
This is an impossible goal. Every survey is biased by its creator.. By changing the way your question is phrased, you are merely changing the type of bias you need to combat.
If you want to move the needle towards having more diverse responses with write in responses, you'll now have to tackle selection bias. To get a representative sample, you'll need to make sure your respondents are diverse. This is another thing that has come to my attention that the survey has not been doing.
And let me be clear, this will still not make your survey unbiased. The science of surveys involves acknowledging the way your survey affects your data. Some of it is bias you're fine with - by being a survey requiring a computer or mobile device, you're limiting your audience to those who have a computer or mobile device. But there are always areas to improve - by having basic accessibility issues, your survey may frustrate screen reader users so they won't be well represented in the data.
But this is not the crux of the issue. By your own admission, the survey has been heavily biased in the past. You say the current list of options has come from respondents, but I've seen lists from respondents with a variety of names that aren't there. Is there a system of ranking responses so that only those with a lot of clicks got put in the survey? Do you still have the Bluesky starter pack up with 1 woman and no Black people? It's those kinds of choices we're asking you to rectify through further action. This is not solvable by changing the phrasing of the question. It is your responsibility to use your platform to promote a diverse list of people because you have spent so many years doing the opposite.
Thank you for the response, @SachaG — and yes, I did read it all :)
Regarding the disclaimer — and i don't have an answer here but more a thought to ponder.
What value is a "JS developer survey" if the survey "may not be representative of the developer community at large."
Why pursue the creation of a survey if the data is invalid and seemingly self-serving to provide you credibility and an audience?
If you're running so many surveys that you can't sustainably validate the data, you should reconsider why you ran the surveys in the first place.
This is much like me when I poll all of my office coworkers, "Should we get pizza for lunch?" and get a resounding "yes." However, the total number of people who work from my home office is but one: myself.
While, yes. As much as I can agree all of my officemates (me, myself, and I) have been happy with my lunch surveys, the data's overall value is questionable and largely only self-serving.
Two points I want to make before I respond:
With that out of the way:
@abbeyperini
This is an impossible goal. Every survey is biased by its creator.. By changing the way your question is phrased, you are merely changing the type of bias you need to combat.
I definitely agree with that. The part where I usually disagree with people is what to do about it. The Bluesky starter packs are a great example.
The State of JS starter pack only has one woman. But the State of CSS one has 10. Why? Because compared to JS, the CSS community has done a better job of welcoming women, which shows in the survey data, which the starter pack is based on.
So the same exact data collecting methodology resulted in 10x more women in one case, and I would argue that this is because the underlying reality that is being surveyed is different.
Now for a starter pack itself this is thankfully pretty straightforward. There is literally no downside to adding more women to the JS one, and I thank you for prompting me to do it.
But when it comes to the survey, all I ask is that people first understand this reality before criticizing it. Because this is what makes this a very hard problem to solve. Breaking out of that bubble means breaking out of X, Reddit, YouTube – which is something that I've tried doing many times, and failed – and that no other large-scale open survey (to my knowledge) has successfully done.
If anybody reading this thinks the problem is all merely due to my personal bias, underlying survey bias, or any other easily fixable factor and want to launch their own survey to do a better job and demonstrate my mistakes, then by all means please go ahead. Every "State Of" survey for the past couple years already links to as many other surveys as I could find and I would be thrilled to add one more if you want my help.
@erinmikailstaples
What value is a "JS developer survey" if the survey "may not be representative of the developer community at large."
This is such a great question.
Would a survey of 805 women developers have value? Because this is how many women took the last State of JS survey. I know this is tiny compared to men, but it's not nothing. Why would the fact that these women are under-represented in the dataset make their input any less valuable?
If people are truly interested in hearing what women have to say, we provide tools to filter any chart by gender directly in the survey results:
Why pursue the creation of a survey if the data is invalid and seemingly self-serving to provide you credibility and an audience?
I make my living from these surveys so it's true it's self-serving. But if credibility and money were my only motivators, I would have stopped collecting gender and race stats a long time ago, just like other major developer surveys have done.
The reason why I keep collecting these stats is because I (try to) hold myself accountable to the community. And it's also the reason why I feel confident arguing about this topic instead of just letting it slide. I've thought about this a lot and I do care about it.
Finally, I feel like the end goal of diversity can often get lost in the shuffle. Why do we want a more representative survey?
The way I see it, the reason is inclusion. A chart that says "97% male" sends a strong signal that women are not welcome here – while one that is less skewed says the opposite.
But let's not forget that there are other ways beyond raw numbers for surveys to platform women and make the community more welcoming. Ways such as:
I do not mention these women here to somehow imply that they agree with me or would defend me – in fact many of them have been critical of me and of the surveys. Yet at the end of the day, I think what matters is that the net result of our collaboration has been more women developer being platformed online.
I agree with you that getting a representative respondent pool is a difficult task. There is a reason you see a lot of bribes for research participants.
I am arguing that convenience sampling is leading to a biased sample of data. Furthermore, especially since you do this for a living, I am arguing that it is your responsibility to do at least snowball sampling. I am hoping you can do stratified sampling.
This does not mean asking womens' organizations to do it for you for free. It doesn't have to mean moving away from the platforms you mentioned. Minority communities exist in all of them.
If you can resort to bribes, fantastic. If not, you will have to build relationships, which will require a public commitment to diversity. Adding black women to the CSS starter pack and black men to both would be a great start. Also, it would be worth the time to go back through the other questions looking at them through this lens. How diverse is the podcast question list? I didn't check.
Please expand on the last bit - how does four women doing work for you platform them? Or what do you mean by platforming?
I am arguing that convenience sampling is leading to a biased sample of data. Furthermore, especially since you do this for a living, I am arguing that it is your responsibility to do at least snowball sampling. I am hoping you can do stratified sampling.
Would you recommend implementing snowball or stratified sampling in addition to convenience sampling, or instead of? Besides the practical issues of time and money I don't see any downsides to implementing them in addition to it. But I also already know that this will not quiet the criticism the survey has received, since the scales these types of sampling operate on are vastly different.
If you suggest doing it instead of convenience sampling, in other words making this a closed (or semi-closed) survey, then while I agree this would improve things from a diversity standpoint, I think it would also have some very real downsides due to the much smaller resulting dataset (I can expand on that if you'd like me to).
Please expand on the last bit - how does four women doing work for you platform them? Or what do you mean by platforming?
I find this question confusing. Isn't the goal to feature more women developers? Doesn't asking a woman to give the survey's closing argument accomplish that? It's saying that not only are there women in the JavaScript community, but that the single most qualified and authoritative person to sum up the entirety of the year in JavaScript is a woman. I would hope this sends a strong message?
In addition to. Ideally, once the relationships are there, it will start trending towards convenience sampling again. Are you sending the link to former participants every year? Apologies if you are - I genuinely can't remember.
Ahhhhh ok here's the disconnect - I'm suggesting you use your platform to publicly promote a diverse group of individuals as a gesture. It's expected that you include women in the process and work with them. You want to combat the perspective that you have created with the list of only men? Do a weekly newsletter or podcast or post on all platforms highlighting diverse content creators. Doesn't have to be people who have been written into the survey before. In fact, promoting smaller content creators would be cool. You could even offer that kind of thing in addition to a bribe to people you're asking to help you get respondents.
Since this survey is getting posted in multiple Slack and Discord channels, and specifically a call to action was prompted about this topic in the OpenJS slack
Also, I thought I'd ask since this is an issue that comes up every year: I'm always trying to broaden the surveys' audience > to include more women, but so far I haven't found any reliable and scalable method.
if anybody has suggestions (or maybe can recommend the right person or working group to ask about this) I'm all ears!
just curious, what prevented you from being able to add any woman to the list this year, even after all the feedback from previous years? Was there a significant hurdle that made it harder than adding a bunch of men to the list? It's not clear to me what makes the process so much harder for woman than it is for men that even after years of feedback, even some of the most obvious choices like Cassidy Williams, Sarah Drasner, Salma, or basically anybody from this list couldn't get a place on the list?
Even as you admit its not a first time occurrence, and sounds like you've had a chance to gather feedback over the years, just seems a bit off that it would take a Working Group to throw even the simplest of a consideration of getting even one of them on the list? Like just pick a name out of a hat at minimum? 🤷♂️
Curious, did you post this call to action before this year's survey at all?
@abbeyperini
In addition to. Ideally, once the relationships are there, it will start trending towards convenience sampling again. Are you sending the link to former participants every year? Apologies if you are - I genuinely can't remember.
In this case we're on the same page. The remaining obstacle here is just a practical one – maybe this is where I betray my lack of formal data science background, but I have not yet found a way to reliably recruit more women.
So I guess the obvious conclusion is that I need to do is hire someone more qualified than I am on this issue, to see what approach they would pick (this is something that @jlengstorf has suggested before, but I probably wasn't as receptive to the idea as I should've been). If that's you then feel free to DM me on Bluesky :)
Ahhhhh ok here's the disconnect - I'm suggesting you use your platform to publicly promote a diverse group of individuals as a gesture. It's expected that you include women in the process and work with them. You want to combat the perspective that you have created with the list of only men? Do a weekly newsletter or podcast or post on all platforms highlighting diverse content creators. Doesn't have to be people who have been written into the survey before. In fact, promoting smaller content creators would be cool. You could even offer that kind of thing in addition to a bribe to people you're asking to help you get respondents.
You're right, if a woman writes the conclusion then I am also getting something in return, so in that sense there's still a transactional aspect to it. But as you point out I could also use my platform in other, less transactional ways, and I'd be happy to do it. In fact that was precisely the spirit behind the "Pick of the Year" feature – have a way to put people forward outside of the confines of the survey format.
I also just offered on Bluesky to go on a women-led podcast and help publicize it. I guess this is still transactional in away, but I feel like if I'm going to email our list about something it should have some relation to the survey.
@thescientist13 as I explained in previous posts in this thread, I was trying to find a methodology that removed my own agency from the list-making process, and the one I settled on was to base the list on last year's data. You can disagree with that but in any case that was the reason.
Curious, did you post this call to action before this year's survey at all?
The survey is on-going right now, so that's why outreach is also happening now.
Follow up: I'm testing out a Reddit ad specifically geared at driving more women to take the survey:
I will also test more neutral wording ("Help us take the pulse of the JavaScript community"). The ads will run in r/womenEngineers, r/womenintech, and r/LadiesofScience.
I'm also looking into hiring someone: https://github.com/Devographics/surveys/issues/257
Since I've given this a couple days of thought, I think I can recap where I landed on all this, and what I'm doing about it.
But first, a recap of the issue if you're just tuning in now:
Two days ago this year's State of JS survey went out. Respondents called it out for the "Which of these video creators do you follow?" question, which featured a list of predefined options that did not contain any women streamers or YouTubers.
This happened because the list was built by taking last year's top 15 answers to a freeform input question about the same topic.
And that data didn't contain any women because of a combination of A) many top tech YouTubers being male and B) the survey's own audience being 94% male (the idea being that male viewers are more likely to follow male creators).
A discussion ensued on Bluesky, with many posters stating that the survey had lost their trust due to its lack of diversity.
With that out of the way, let's move on to my conclusions.
Even though this is what originally triggered the issue, I don't think we need to spend that much time on this anymore. The very first thing I did at the start of all this was to replace the list by a freeform text input, so the issue of built-in bias in the question phrasing specifically has actually been solved from the start (obviously the issue of built-in bias in the survey itself is much larger).
Now there is a high chance that the resulting data will still contain all or mostly men (after all the controversial list was taken straight from last year's data). Which brings me to my second point.
As @abbeyperini has helpfully pointed out, the survey has always collected data through convenience sampling, but there are other sampling methods available.
I don't know if those other sampling methods will fix people's issues with the surveys, but I realize I've probably been a bit to quick to think I have all the answers (which is an easy trap to fall into when you run surveys ;). So I will now be looking for a qualified quantitative researcher to consult on the project.
I've never paid a streamer, YouTuber, newsletter writer, or influencer to mention or advertise the survey, both because this started out as a personal, non-commercial project with limited resources; and because I always thought if I did things right people should be willing to share the survey based on its own merit, without requiring financial incentives.
But if I'm being honest, I have to admit that this passive approach is very susceptible to reinforcing entrenched biases. And after all, the survey itself also benefits from sponsorship and advertising, so it would seem only natural for me to sponsor others as well.
I don't expect any of this to change anybody's mind, but I'm just trying to be as transparent as possible in an effort to eventually regain (or at least, avoid losing even more) the community's trust.
Again, thanks to all who participated in this discussion, and despite the aggravation caused by discussing such a sensitive topic I'm hoping it will end up having a productive outcome!
Another realization I want to write about (sorry if I'm using this thead as a personal journal of sorts…).
A common dynamic that has taken place has been people pointing out their issues with the survey, and me feeling justified in defending my work in response.
What I failed to realize is that the perception of an issue is in itself a problem, whether I agree with the underlying criticism or not. If I feature a list of all-men creators, it will disenfranchise women survey-takers – no matter how well I can justify the process that led to the list, to myself and others.
At that point the right move is to remove the list altogether (which at least I did immediately) – and I could also see why some would argue the question itself; or even the whole survey shouldn't exist at all (although I won't go that far myself).
This all seems obvious to me now, but it wasn't at the time. There's something comforting about trusting in your own work and forging on, without worrying what others will think – in fact, it's often the only way to move forward at all for solo creators, since you don't have the luxury of hiding behind a company when things get dicey.
But I think the past couple days show that I've reached the limits of this approach, and I'm now thinking about what should come next.
Another datapoint, based on 2023 data for the "Who do you follow?" question:
(I am using my own judgement to determine if someone is a woman or not here, which I realize can also be considered problematic but let's put that aside for now)
Even though this is already a very interesting result, the problem is that the women-only dataset is currently too small. For example, while the top overall person got 357 votes, the top person among women only got 14. So this is another good reason to recruit more women.
Wrote something to wrap this up for now: https://dev.to/sachagreif/a-story-of-developers-data-and-diversity-29el
Note: this post centers women in the discussion because this is what the feedback I've received was about, but the same issue applies to any minoritized segment of the developer community, whether BIPOC, LGBTQ, people experiencing disabilities, and more
The Issue
The "Which video creators do you follow?" question in the State of JS 2024 survey originally featured a list that included no women creators:
The Root Cause
There are two basic ways to collect data in a survey: through a freeform text field, or through a list of predefined options (which in the State of JS case also supports an optional "other answers…" freeform field). Each of them have their pros and cons:
Predefined Options
Pros:
Cons:
Freeform Textfield
Pros:
Cons:
Old Solution
The solution I originally settled on was to alternate between both formats each year. In 2023, this question used a freeform text field:
I then used the resulting data to populate the set of predefined options for the 2024 edition. This strategy aimed to balance the pros and cons of both formats, by ensuring maximum question participation with predefined options one year, while hopefully "resetting" any bias by switching back to freeform input the following year.
In other words, because respondents did not mention women video creators in 2023, they were then absent from the list I compiled for the 2024 edition.
New Solution
The downside is that without this background knowledge, it can see like this year's list of predefined options is the result of an arbitrary process that doesn't include any consideration for diversity and inclusion.
For that reason I've decided to just permanently keep the "Video Creators" and "People" question as freeform inputs, to avoid giving off this impression and thus alienating the very audience the survey lacks the most.
Survey Goals
I do feel the need to add one more thing: especially lately, being "anti-DEI" in the tech community is becoming worryingly trendy. So I want to be absolutely clear that this is not what's going on here. I firmly believe that inclusivity and representation efforts are important for both moral and practical reasons.
Which is exactly why I believe the survey's goal should be to measure the success of those efforts, and not to try and put a thumb on the scale one way or another. So even though artificially drafting a more diverse list would quiet a lot of the criticisms around this issue, I feel like it would not be conducive to actually measuring what the survey's audience thinks.
(The fact that the survey audience itself might be biased or unrepresentative is a related – and quite valid – concern. But again, masking this issue by tweaking the list would not actually be fixing it either.)
What You Can Do
If you are concerned about this issue, the two best things you can to do are:
And if you happen to be a woman video creator yourself, I would suggest encouraging your audience to take the survey. I realize this comes across as self-serving coming from the person who runs the survey, but it's the surest way to make a difference in the stats.
What Do You Think?
I'm well aware that this comes at a cost for women/BIPOC/LGBTQ/disabled/etc. content creators, since they end up drowned out by the stats.
For that reason, I am not saying this is a perfect approach. I've gone back and forth over this issue many times in the past, and I probably will again in the future.
For example, maybe you think that using the surveys' platform to promote women creators should take priority over its other goals, and that's certainly understandable as well.
So if you have a suggestion on how to tackle this issue, I would love to hear more!
Recommended Reading