[ ] 5.3: This slide makes it seem like long is more tidy than wide. But that's not true. In Exercise 1 we have the cereal data where the long version is untidy and the wide version is tidy. So, I think we need to make this a bit clearer. The most amazing thing would if you can come up with a single example and 3 formats: too long, just right, and too wide. Is that doable? I think it also depends on the application. Because, for this chocolate bar dataset, I'd actually prefer the "too wide" format if I was doing supervised learning. It really depends what you're doing. So maybe an alternative to my 3 formats suggestion is to have 2 formats and 2 questions, one question where the wide format is tidy and one where the long format is tidy? 🤔 Also, I don't love the detour from cereals to chocolate bars, but I can live with it if needed.
[ ] 5.6: When you explain each argument, I think it would be more useful to explain what it does in general, without making specific reference to name and nutrition and value. I got a bit confused with the current version, because the argument names and the column names both appear in code font and it's a bit ambiguous what is what, at least without thinking carefully. Also, if possible, I would love to show both dataframes here. The problem is that the code cereal_long.pivot(index='name', columns='nutrition', values='value') is referring to column names in the original df, but we can't see it. We need to be able to connect the code to the df on the same slide. This isn't reproducible, but maybe an image would be better, and you can circle those 3 column names? Update: see my comment for 5.8.
[x] 5.22: row -> rows? Also, did we learn drop for rows? I mainly remember it for columns.
[x] Exercise 5 is really long - I suggest putting pivot_table as its own Exercise and adding some interactive stuff in between.
9
[ ] I wonder, though, if we could come up with a compelling use case where melt makes the data tidier. This relates to my earlier comments. Or maybe that's coming, let's see...
11
[ ] I'm very confused by the true/false. Isn't it less tidy now?
12
[ ] 12.7: have they seen ~ before?
[ ] 12.7: text is cut off for me
[ ] 12.8:
[ ] oh, now we have the tilde. Maybe we should move this to the filtering part of Module 2?
[ ] start with Tilde (~)
[ ] compliment -> complement
[ ] can't see the result of the last line of code
[ ] 12.11: output is cut off
[ ] 12.13: mention that we're going back to horizontal concatenation?
16
[ ] 16.2: dataframe -> dataframes (in 2 places)
[ ] 16.4: androws -> and rows
[ ] 16.4: at this point it'd be good to give a high-level overview of what type of merging we're going to be working towards - is it horizontal, vertical, something else entirely?
[ ] 16.5: this is really well done
[ ] 16.6: I think it's better to stick with the candy bars, the cereal was a bit jarring
[ ] 16.7: again, really clear and well done
[ ] 16.8: start a new sentence before "in the future"; we -> We
[ ] 16.10: this could be an opportunity for a non-reproducible figure of this same df, where you circle the 3 parts: present in left only, present in right only, present in both
18
[ ] The binder experience isn't very smooth here in general, hmm, oh well.
19
[ ] "Ah, it appears we have multiple rows for some of the same sets." -> that is true, but are they asked to do something which would lead them to this conclusion?
5
[ ] 5.3: This slide makes it seem like long is more tidy than wide. But that's not true. In Exercise 1 we have the cereal data where the long version is untidy and the wide version is tidy. So, I think we need to make this a bit clearer. The most amazing thing would if you can come up with a single example and 3 formats: too long, just right, and too wide. Is that doable? I think it also depends on the application. Because, for this chocolate bar dataset, I'd actually prefer the "too wide" format if I was doing supervised learning. It really depends what you're doing. So maybe an alternative to my 3 formats suggestion is to have 2 formats and 2 questions, one question where the wide format is tidy and one where the long format is tidy? 🤔 Also, I don't love the detour from cereals to chocolate bars, but I can live with it if needed.
[ ] 5.6: When you explain each argument, I think it would be more useful to explain what it does in general, without making specific reference to
name
andnutrition
andvalue
. I got a bit confused with the current version, because the argument names and the column names both appear incode font
and it's a bit ambiguous what is what, at least without thinking carefully. Also, if possible, I would love to show both dataframes here. The problem is that the codecereal_long.pivot(index='name', columns='nutrition', values='value')
is referring to column names in the original df, but we can't see it. We need to be able to connect the code to the df on the same slide. This isn't reproducible, but maybe an image would be better, and you can circle those 3 column names? Update: see my comment for 5.8.[x] 5.22: row -> rows? Also, did we learn drop for rows? I mainly remember it for columns.
[x] Exercise 5 is really long - I suggest putting
pivot_table
as its own Exercise and adding some interactive stuff in between.9
melt
makes the data tidier. This relates to my earlier comments. Or maybe that's coming, let's see...11
12
~
before?~
)16
18
19