Closed aaronzhangSema4 closed 3 years ago
Hi @aaronzhangSema4 and thank you for the praise. This is a good question, and one i've elided in the vignette.
The answer is contained in the documentation for the vaccinations data set; run help(vaccinations)
to read it, or see the page on the website. The "freq" column is the number of survey respondents who had the same set of responses to all three surveys. This means that the "subject" column is a bit of a misnomer; it identifies the cohort of subjects with common responses, not the individual subject. Does that clarify it for you?
I think your confusion is reasonable, so i'll add a note about this to the vignette for the upcoming release. Thank you for raising the issue!
Thank you @corybrunson for the quick response. I agree that it makes more sense if each "subject" actually represent a "cohort".
However, it still does not make sense if the "response" column represents same responses to all three surveys. I made sure that there are three surveys in the dataset:
> vaccinations %>% select(survey, start_date, end_date) %>% distinct()
survey start_date end_date
1 ms153_NSA 2010-09-22 2010-10-25
2 ms432_NSA 2015-06-04 2015-10-05
3 ms460_NSA 2016-09-27 2016-10-25
Then for each cohort, they should have the same freq at each axes of the alluvial plot. For example, if 50 respondents gave "Always" to each of the three surveys, then it should be 50 at axe1, axe2 and axe3. I think I am missing something, or the data documentation lacked something...
Yes, there are only three surveys, though not the same total numbers of participants responded, for example, "Always" to each one. Rather, what should be constant is the number of participants in each cohort at each survey, i.e. the value of "freq" should be the same for each value of "subject". That holds up when i inspect the data. Does it make sense?
The explanation was added in commit 3413a84d94a977ffc73b06de31c5f60fd114ca5f.
Thanks. I misunderstood part of the data precessing. Now it makes sense.
Why does a subject has a freq column?
Thanks for this great package. I am reading the tutorial:
https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html
The last example got me really confused. What does the value of frequency mean for subject 1 at survey "ms153_NSA"?