Open SachaG opened 1 year ago
Things to look into:
duration
missing from the dataset?I found 131 responses with 100% completion but <5 min duration, among which 9 had a 1 min duration. So I do think there is a problem with the way completion is calculated, but it doesn't seem to be too widespread.
I will try to recalculate completion based on the entries.
@LeaVerou, may I know what question we specifically would like to address when checking the duration data?
By the way this chart is wrong, the percentages don't add up to 100%. I fixed it locally and will redeploy a new version soon.
Actually, in the new version it's even more imbalanced with >70% of respondents in the 90-100% bracket. Which made me wonder if maybe we should use a different data visualization, like maybe not grouping the items and showing all 100 bars? But then the axis labels would get very messy…
@SachaG There must be a problem with the completion variable for CSS 2023. I tried to filter the data so that it only includes those that have 100 completion value. I then looked at the number of NA's for all columns starting with 'features'. Below is what I got. This needs to be checked. (I can try to recalculate it for the info being requested by Lea.)
This is the reason for your one of your listed questions:
How can there be responses with 100% completion but very low duration?
- Is
duration
missing from the dataset?
It is - across css 2022, css 2023, and js 2022. But as you mentioned, can simply be calculated using (updatedAt - createdAt)
@LeaVerou
user_info
, id's, surveySlug, etc.), then subtracting from 100. Here, we can see that JS seems to take less time than both CSS surveys. If the length of the survey could be checked through the number of columns in each, it shows that JS is the longest among them (351 columns vs 270 & 260 - CSS 22, 23; resp.). @SachaG Let me know if i missed something.
Thanks @ShaineRosewel, these are very comprehensive for glancing at, but could I please also have medians, means, and stdev as I want to calculate something?
Thanks @ShaineRosewel, these are very comprehensive for glancing at, but could I please also have medians, means, and stdev as I want to calculate something?
survey | mean | median | sd |
---|---|---|---|
cs22 | 19.8 | 14.5 | 19.0 |
cs23 | 23.7 | 16.4 | 25.0 |
js22 | 19.2 | 15.7 | 13.3 |
Values being too spread out is due to the fact that respondents have varying completion percentage. Let me know if you are after records that have a high completion percentage.
Thanks @ShaineRosewel, these are very comprehensive for glancing at, but could I please also have medians, means, and stdev as I want to calculate something?
survey mean median sd cs22 19.8 14.5 19.0 cs23 23.7 16.4 25.0 js22 19.2 15.7 13.3 Values being too spread out is due to the fact that respondents have varying completion percentage. Let me know if you are after records that have a high completion percentage.
Thanks for the fast response! Yes, I think if we narrow it down to respondents with a high completion percentage it may be better. Would that be too much hassle to calculate?
Still large - this includes responses with at least 80% completion. This large sd is expected since we are dealing with response times. If this ain't okay for your purpose, we can actually use the log scale, so we first make sure that the data resembles a normal curve - that way, sd will become a better measure of dispersion.
survey | mean | median | sd |
---|---|---|---|
cs22 | 22.3 | 16.2 | 18.5 |
cs23 | 26.9 | 18.3 | 24.6 |
js22 | 20.7 | 16.7 | 12.8 |
Let me know @LeaVerou if I can help with what you are trying to compute!
Let me know @LeaVerou if I can help with what you are trying to compute!
I was trying to compute a very rough measure of the additional time needed to answer 5-answer questions compared to the 3-answer feature questions, but the data is too noisy for that and there's way too many confounds and factors that are different even across different years of the same survey.
If we assume that the time to fill in the rest of the questions was roughly the same across both years of State of CSS, that would give us that the additional 3 feature questions cost 2-4 extra minutes (depending on whether we use medians or means), which makes no sense at all, so the rest of the survey must have been substantially different. And it's impossible to compare with State of JS, since the rest of the survey is so different. If you can think of any way to calculate this, great; personally I'm out of ideas (which doesn't happen often 😅).
I'll try to think of a way and let you know if i have an idea. Our duration is for the entire questionnaire. I think that will be a lot easier to do if we have a per item response time.
Charts by @ShaineRosewel