Closed hfboyce closed 3 years ago
diff
where the results are all positive, like 2, 5, 20
or something.;; other function -> other functions;; maybe less confusing would be np.log10(100)
. numpy.ndarray
right?x[0,3]
means the first row, fourth column. Here talk about .shape and .ndim (and optionally .size if you want). Also, moving elementwise operations earlier will motivate why numpy is useful at an earlier stage, which is important too.You could even do basic indexing of 1D arrays in the first slide deck if that helps make them more equally sizedto_numpy()
. I would actually introduce to_numpy()
early on so they can see that the whole thing is sitting on top of a 2D numpy array - right now you only show it for a single value in a pandas dataframe.array([ 0, 5, 10, 15, 20 25 30])
and then the answers np.arange(0, 35, 5)
and np.linspace(0, 30, 5)
. That way the linspace one is wrong because it should be 7 rather than 5. =
is a bit misleading, instead say Given an array with a shape
of (2,3,4,2)
reshape
np.nan
vs. None
? Basically, since numpy arrays all need to be the same type, and since np.nan
is considered a number, you can have a numerical numpy array with some of the values as np.nan
(a "number") but it couldn't have a None
in there since that has type NoneType
. head()
before info()
? I find head()
really gets me oriented on a new dataset.str.split()
earlier? Maybe make a note that they don't need to understand all this code in detail because it's just a demonstration of how tricky this is going to be? But I actually really like the approach- it's fun to show how painful this is. d
for datetime
;; here and later it incorrectly says parse_date
instead of parse_dates
parse_dates
, can we do some fun stuff with it right away? Presumably this fun stuff is later and can be moved - let's see.new_cycling['Date'].dt.day_name()
for clarity before combining it with assign
dt.
or without the ts.
?parse_dates
, that may be better to put earlier. Even before diff
think sorting is already a "killer app". Can you explicitly show that if the datetimes are just strings and you do sort_values()
that it sorts incorrectly? this would be a great motivator. maybe that would be sufficient to build up interest before the "grind" of all this syntax I mentioned.round
? Also, is there possible confusion between round
and np.round
? 'hi' + 'hello'
gives 'hihello'
;; maybe we could pick a more authentic example here, like putting in the distance, so it says, say, "Afternoon Ride 12.62 km" or something? Or even just a Distance_str column that says "12.62 km" ? the cycle_tripled feels particularly contrived..title()
that capitalized every word in a string:" -> "Another is .title()
, which capitalizes the first letter of every word in a string:"strip()
and it happens all the time in free text entry.T
'sWill do the last section (23-27) later today.
head(9)
? and then you say the first 10 rows? contains
will find it anyway. I wonder if we can make this a bit more authentic. I'm assuming Tom wouldn't mind if you change the dataset a bit to suit our needs. For example, perhaps we could have one comment be "Thankfully not raining today!" and so we could explain that just using contains
would make a mistake here. So instead our plan is to be conservative and only check if the comment == "rain" (also imperfect, but a conservative estimate), but then we can have one with a whitespace issue of "rain " and then we can strip it out?import pandas as pd
twice and also switched from the lego to canucks. can you fix this up and then I'll take another look?Overall, nice job - only one more to go! 📈
First off You had to put up with my very rough raft for those first few slide soo I feel terrible for that.
5.15 I think there's an analogy to pandas filtering syntax that we could make here? If so we should.
Ok so technically there is but we taught them not to replace values in this manner and instead use .loc[]
so should I mention that this ?
11.2: "not that" -> "note that" ;; should we mention somewhere the distinction between np.nan vs. None? Basically, since numpy arrays all need to be the same type, and since np.nan is considered a number, you can have a numerical numpy array with some of the values as np.nan (a "number") but it couldn't have a None in there since that has type NoneType.
I teach this to them in module 4. actually. Should I bring it up again? "Remember how in module 4 we explained ...."
11.11 should we take out bfill and ffill? they are appropriate here because it's a time series dataset, but we don't have many of those, and it's not appropriate in most cases. or, at least mention that it only really makes sense in those cases, as opposed to all the other datasets like cereal etc.
I do mention this about bfill and ffill.
16.3: I don't think they can understand this yet - or did they already learn str.split() earlier? Maybe make a note that they don't need to understand all this code in detail because it's just a demonstration of how tricky this is going to be? But I actually really like the approach- it's fun to show how painful this is.
They did learn about str.split()
so non of this code should be new to them
Starting at 16.10 we have a bunch of "how to" slides which are a bit of a grind to get through. Let's first motivate them for why this is useful. Once we introduce parse_dates, can we do some fun stuff with it right away? Presumably this fun stuff is later and can be moved - let's see.
Can you see if this is better?
16.17: now we get to something that takes advantage of parse_dates, that may be better to put earlier. Even before diff think sorting is already a "killer app". Can you explicitly show that if the datetimes are just strings and you do sort_values() that it sorts incorrectly? this would be a great motivator. maybe that would be sufficient to build up interest before the "grind" of all this syntax I mentioned.
I want to show you what I did here and see if it's enough. I had to change the data a bit. I also don't exactly know what you mean by "Killer app"
18: so do they definitely know how to use round? Also, is there possible confusion between round and np.round?
So by this point we've use round and they have seen it. In this example we are using Python's built-in round()
which they definitely have encounted. I dont really want to bring in the idea of all the different rounds because then we would have to explain np.round pd.round and the regular round which seems to make matters extra confusing.
23.5: looking back, this feels a bit un-useful since the contains will find it anyway. I wonder if we can make this a bit more authentic. I'm assuming Tom wouldn't mind if you change the dataset a bit to suit our needs. For example, perhaps we could have one comment be "Thankfully not raining today!" and so we could explain that just using contains would make a mistake here. So instead our plan is to be conservative and only check if the comment == "rain" (also imperfect, but a conservative estimate), but then we can have one with a whitespace issue of "rain " and then we can strip it out?
I'm note sure if I understand what you want me to do here.
23.7: this is a fun calculation to perform, but it's going to be tricky, let's see how it unfolds...
I didn't add the calculation, but I could. Should I?
26: something is wrong here. the second code block has import pandas as pd twice and also switched from the lego to canucks. can you fix this up and then I'll take another look?
Fixed!!
27: Overall, I feel we're still lacking a compelling use case for numpy - like, why are they learning this? I guess Tom influenced us to put it in, but I wish we had a good example. I guess that can go on our wish list.
Add to the wish list, or should we work on this ?
@mgelbart So I changed around the NumPy sections and I wanted you to have a look over it to see it this flows better. I also have some other questions but we can address them on Monday.
@mgelbart Can you please look over Exercise 26 before I give it to Elijah? Thank you (Also please look over Exercise 1 and 5 again which are the NumPy sections.) Also maybe Exercise 11(quickly?)
2nd review of Exercises 1, 5, 11, 26
zeros
takes a tuple whereas rand
takes the integers directly - it's easily confusing. shape()
-> shape
Will do 11 and 26 after my next meeting.
any
and (2) using this to filter. It would be less confusing if you used the distance column in both, or if you used any in both, or if you added an intermediate slide in between that introduced any
and then used it to filter.lego
object;; for the second instruction, we should say that they are looking in the name
column (and same for the 6th instruction);; fourth instruction, i think the W
should be lower case, right?
Yay! Finish first pass at module 8. Here is the link. There should be 26 exercises.
I'm a little unhappy with the last slide deck so suggestions welcomed.