UBC-MDS / programming-in-python-for-data-science

https://prog-learn.mds.ubc.ca/
Other
20 stars 22 forks source link

Feedback Module 3: Round 1 #22

Closed hfboyce closed 4 years ago

hfboyce commented 4 years ago

I ~may do another read through Monday morning but Module 3 is done~ have done another read through and I think I am ready for round 1 feedback (note: I will be implementing round 2 Module 2 feedback first) :

mgelbart commented 4 years ago

FYI I have plans to review this on Thursday.

mgelbart commented 4 years ago

1

2

3

4

5

7

9

10

11

12

13

15

17

19

20

23

24

Overall, I know you put a ton of work into this, and it shows, so you might not be happy to hear this, but I feel this module needs more work than the previous ones I reviewed (see comments above). In particular, I feel Exercises 5 and 9 need quite a bit of work, and that we might need a new Exercise on indexes. That being said, it's a great start and we are making progress. Don't be discouraged!

hfboyce commented 4 years ago

I'm confused by the notion of stacked/unstacked vs. long/wide. Let's discuss this.

Pivot/melt and stack/unstack can do the exact same thing. Some people prefer stack/unstack over pivot and melt. I including this in the module because I generally tried to include everything in the module that the python part of DSCI 523 had (except of course with more!). Tom talked about stacking/unstacking for ~10 mins in one of his lecture and I know that multi-indexing came up for Imbellus’s take home assignment. I think it’s important to include because of the preference some companies have. That being said I also think we should still teach pivot/melt since it’s a bit more clear for beginners so now I am conflicted on where we should go with it.

I have a great Youtube video resource that explains this well in a Jupyter notebook.

We can discuss this in our meeting tomorrow.

hfboyce commented 4 years ago

@mgelbart OK! Buckle up! I edited and made the changes you suggested and revamped 4 sections.

They are quite different but luckily the exercises could stay relatively constant.

Hopefully this works a lot better. I made some new viz for melt and pivot and removed the ones you did not like. I also fixed the gifs for concat and merge.

There are now 20 exercises.

mgelbart commented 4 years ago

1

2

3, 4

5

I am calling it a day - will do Exercise 6 onwards at a later time.

hfboyce commented 4 years ago

5.3: This slide makes it seem like long is more tidy than wide. But that's not true. In Exercise 1 we have the cereal data where the long version is untidy and the wide version is tidy. So, I think we need to make this a bit clearer. The most amazing thing would if you can come up with a single example and 3 formats: too long, just right, and too wide. Is that doable? I think it also depends on the application. Because, for this chocolate bar dataset, I'd actually prefer the "too wide" format if I was doing supervised learning. It really depends what you're doing. So maybe an alternative to my 3 formats suggestion is to have 2 formats and 2 questions, one question where the wide format is tidy and one where the long format is tidy? 🤔 Also, I don't love the detour from cereals to chocolate bars, but I can live with it if needed.

I think I have an idea for this I would like to show you.

5.6: When you explain each argument, I think it would be more useful to explain what it does in general, without making specific reference to name and nutrition and value. I got a bit confused with the current version, because the argument names and the column names both appear in code font and it's a bit ambiguous what is what, at least without thinking carefully. Also, if possible, I would love to show both dataframes here. The problem is that the code cereal_long.pivot(index='name', columns='nutrition', values='value') is referring to column names in the original df, but we can't see it. We need to be able to connect the code to the df on the same slide. This isn't reproducible, but maybe an image would be better, and you can circle those 3 column names? Update: see my comment for 5.8.

Can we discuss this further? I had a slide that did exactly this but this is what you said regarding it so I amended according to the comment you said below. I think I am not understanding correctly what you are suggesting.

"5.4: again, going back to the same teaching strategy: be concrete, not abstract. Here, show a dataframe. Then, for each argument, give an example that corresponds to the df we're looking at. People hardly ever understand anything abstract like this unless it comes after the concrete."

5.4 This is what it was before :

image

5.8 I like this! What if you also showed an image of the line of code here, and had arrows between the column names in the code and the column names on the left dataframe? Or you could have one slide with just what you have, and then another slide where the code and arrows are added in or something? Update: see my comments for 5.9.

Does this mean I can leave slide 5.8 -5. 9 as is for now? (besides making more room for text?)

5.22: row -> rows? Also, did we learn drop for rows? I mainly remember it for columns.

Adding it in !!!

Exercise 5 is really long - I suggest putting pivot_table as its own Exercise and adding some interactive stuff in between.

I'll make changes on saturday for this.

I've addressed the majority of the issues and will push them all tomorrow. My 2 biggest things I want to confirm are 5.3 and 5.6.

mgelbart commented 4 years ago

I'll just keep going for now.

6

7

8

9

11

12

16

18

19

hfboyce commented 4 years ago

Discuss:

5

5.3: This slide makes it seem like long is more tidy than wide. But that's not true. In Exercise 1 we have the cereal data where the long version is untidy and the wide version is tidy. So, I think we need to make this a bit clearer. The most amazing thing would if you can come up with a single example and 3 formats: too long, just right, and too wide. Is that doable? I think it also depends on the application. Because, for this chocolate bar dataset, I'd actually prefer the "too wide" format if I was doing supervised learning. It really depends what you're doing. So maybe an alternative to my 3 formats suggestion is to have 2 formats and 2 questions, one question where the wide format is tidy and one where the long format is tidy? 🤔 Also, I don't love the detour from cereals to chocolate bars, but I can live with it if needed.

Made images. Don't know if they will work. Will show in meeting

5.6: When you explain each argument, I think it would be more useful to explain what it does in general, without making specific reference to name and nutrition and value. I got a bit confused with the current version, because the argument names and the column names both appear in code font and it's a bit ambiguous what is what, at least without thinking carefully. Also, if possible, I would love to show both dataframes here. The problem is that the code cereal_long.pivot(index='name', columns='nutrition', values='value') is referring to column names in the original df, but we can't see it. We need to be able to connect the code to the df on the same slide. This isn't reproducible, but maybe an image would be better, and you can circle those 3 column names? Update: see my comment for 5.8.

See above comment

9

I wonder, though, if we could come up with a compelling use case where melt makes the data tidier. This relates to my earlier comments. Or maybe that's coming, let's see...

Discussion regarding wording.

11

I'm very confused by the true/false. Isn't it less tidy now?

Not if opacity is considered a singled variable which i've amended now.

12

Moved Tilde to Module 2

18

The binder experience isn't very smooth here in general, hmm, oh well.

😭

19

"Ah, it appears we have multiple rows for some of the same sets." -> that is true, but are they asked to do something which would lead them to this conclusion?

I wrote something to make this a little clearer.

I didn't review this one that thoroughly.

Should I remove it if you were not engaged?