UBC-MDS / programming-in-python-for-data-science

https://prog-learn.mds.ubc.ca/
Other
20 stars 22 forks source link

Feedback Module 8 #51

Closed hfboyce closed 3 years ago

hfboyce commented 4 years ago

Yay! Finish first pass at module 8. Here is the link. There should be 26 exercises.

I'm a little unhappy with the last slide deck so suggestions welcomed.

mgelbart commented 4 years ago

These comments are from the old version

Continuing after refreshing

Will do the last section (23-27) later today.

mgelbart commented 4 years ago

Overall, nice job - only one more to go! 📈

hfboyce commented 4 years ago

First off You had to put up with my very rough raft for those first few slide soo I feel terrible for that.

5.15 I think there's an analogy to pandas filtering syntax that we could make here? If so we should.

Ok so technically there is but we taught them not to replace values in this manner and instead use .loc[] so should I mention that this ?

11.2: "not that" -> "note that" ;; should we mention somewhere the distinction between np.nan vs. None? Basically, since numpy arrays all need to be the same type, and since np.nan is considered a number, you can have a numerical numpy array with some of the values as np.nan (a "number") but it couldn't have a None in there since that has type NoneType.

I teach this to them in module 4. actually. Should I bring it up again? "Remember how in module 4 we explained ...."

11.11 should we take out bfill and ffill? they are appropriate here because it's a time series dataset, but we don't have many of those, and it's not appropriate in most cases. or, at least mention that it only really makes sense in those cases, as opposed to all the other datasets like cereal etc.

I do mention this about bfill and ffill.

16.3: I don't think they can understand this yet - or did they already learn str.split() earlier? Maybe make a note that they don't need to understand all this code in detail because it's just a demonstration of how tricky this is going to be? But I actually really like the approach- it's fun to show how painful this is.

They did learn about str.split() so non of this code should be new to them

Starting at 16.10 we have a bunch of "how to" slides which are a bit of a grind to get through. Let's first motivate them for why this is useful. Once we introduce parse_dates, can we do some fun stuff with it right away? Presumably this fun stuff is later and can be moved - let's see.

Can you see if this is better?

16.17: now we get to something that takes advantage of parse_dates, that may be better to put earlier. Even before diff think sorting is already a "killer app". Can you explicitly show that if the datetimes are just strings and you do sort_values() that it sorts incorrectly? this would be a great motivator. maybe that would be sufficient to build up interest before the "grind" of all this syntax I mentioned.

I want to show you what I did here and see if it's enough. I had to change the data a bit. I also don't exactly know what you mean by "Killer app"

18: so do they definitely know how to use round? Also, is there possible confusion between round and np.round?

So by this point we've use round and they have seen it. In this example we are using Python's built-in round() which they definitely have encounted. I dont really want to bring in the idea of all the different rounds because then we would have to explain np.round pd.round and the regular round which seems to make matters extra confusing.

23.5: looking back, this feels a bit un-useful since the contains will find it anyway. I wonder if we can make this a bit more authentic. I'm assuming Tom wouldn't mind if you change the dataset a bit to suit our needs. For example, perhaps we could have one comment be "Thankfully not raining today!" and so we could explain that just using contains would make a mistake here. So instead our plan is to be conservative and only check if the comment == "rain" (also imperfect, but a conservative estimate), but then we can have one with a whitespace issue of "rain " and then we can strip it out?

I'm note sure if I understand what you want me to do here.

23.7: this is a fun calculation to perform, but it's going to be tricky, let's see how it unfolds...

I didn't add the calculation, but I could. Should I?

26: something is wrong here. the second code block has import pandas as pd twice and also switched from the lego to canucks. can you fix this up and then I'll take another look?

Fixed!!

27: Overall, I feel we're still lacking a compelling use case for numpy - like, why are they learning this? I guess Tom influenced us to put it in, but I wish we had a good example. I guess that can go on our wish list.

Add to the wish list, or should we work on this ?

hfboyce commented 4 years ago

@mgelbart So I changed around the NumPy sections and I wanted you to have a look over it to see it this flows better. I also have some other questions but we can address them on Monday.

hfboyce commented 4 years ago

@mgelbart Can you please look over Exercise 26 before I give it to Elijah? Thank you (Also please look over Exercise 1 and 5 again which are the NumPy sections.) Also maybe Exercise 11(quickly?)

mgelbart commented 4 years ago

2nd review of Exercises 1, 5, 11, 26

Will do 11 and 26 after my next meeting.

mgelbart commented 4 years ago