Feedback Module 8 - Githubissues

hfboyce commented 4 years ago

Yay! Finish first pass at module 8. Here is the link. There should be 26 exercises.

I'm a little unhappy with the last slide deck so suggestions welcomed.

mgelbart commented 4 years ago

These comments are from the old version

[x] 1.1: capitalization of "T";; it's
[x] 1.2: We should be consistent with NumPy vs Numpy throughout.
[x] 1.3: need space after comma;; I think we can drop the last sentence or at least the mention of Fourier, which is out of scope.
[x] 1.6: I think this needs to be expanded upon a lot. Like 5-10 slides on the idea of multidimensional arrays. This might be very new to people. Also, the drawing is misleading in that it shows axis 0 going up, it should start from the top and go down. To keep sanity, it might also be better to illustrate the 1-D array as vertical since the 0th axis is vertical in general.
[x] 1.7: I think we can be more convincing about the benefits. Like doing x*2 works for a numpy array x but not for a list. Or x+y, etc. It's built for convenient math. If you want to get fancy, you could even show the speed difference either by displaying the elapsed time or having them experience it themselves. But maybe this is too complicated and we can just say it's much faster for numerical operations. Also, maybe let's define numerical as "involving numbers" because it may be a new word for people.
[x] 1.9: I think it would be less confusing to show diff where the results are all positive, like 2, 5, 20 or something.;; other function -> other functions;; maybe less confusing would be np.log10(100).
[x] 2.1: the answer seems to be incorrect
[x] 2.2: the last one is not technically true either, it should be numpy.ndarray right?
[x] 3: bold "not";; also rephrase the question to tell them to look at the documentation, it's not really a hint when it's 100% necessary to solve the problem and it gives the wrong impression a bit.
[x] 4: "containing any the same number of elements" -> not sure what this means;; change wording in answer (in 2 places) from "for each element" to "for each pair of elements"
[x] 5.2: remove "too", replace with colon
[ ] Following up on my earlier comment about 1.6: I suggest a reordering of topics here: move creating arrays and elementwise operations to the first slide deck. Then, make this 2nd slide deck only about : multidimensional arrays, array shapes, indexing, slicing. I think that would be a good way to make sure these topics get the space they deserve. Then we can build up slowly to multidimensional. Start with 1D and talk about how to index/slice those. Then move to 2D and then how to index/slice those. 2D arrays will be hard for people, but indexing will make it clearer to them, e.g. x[0,3] means the first row, fourth column. Here talk about .shape and .ndim (and optionally .size if you want). Also, moving elementwise operations earlier will motivate why numpy is useful at an earlier stage, which is important too.You could even do basic indexing of 1D arrays in the first slide deck if that helps make them more equally sized
[x] 5.2: I don't think it's that important to show them how to make multidimensional arrays "from scratch". Even the fact that you can make them from lists vs. tuples is not that critical. I think it's more useful to talk about creating them with things like np.zeros or getting them from pandas dataframes with to_numpy(). I would actually introduce to_numpy() early on so they can see that the whole thing is sitting on top of a 2D numpy array - right now you only show it for a single value in a pandas dataframe.
[x] 5.3: "built in" -> "built-in";; first sentence change colon to period;; need space after comma "1,2"
[x] 5.11: remove "simply";; move comma before "but" instead of after
[x] 5.12: sintax -> syntax;;
[x] 5.14: manners -> manner;; make of -> made of;; needs another sentence at the end walking them through the result: "the first element is false because 0.4203 is not larger than 0.5" etc.;; I'd introduce the goal before you start. The goal is to have the values "max out" at 0.5, meaning anything bigger than 0.5 gets set to 0.5. maybe a more authentic example could be grades, and you want to set any grades above 100% to 100% before posting them?
[x] 5.15: "hold process and avoided" -> "whole process and avoid";; I think there's an analogy to pandas filtering syntaxthat we could make here? If so we should.
[x] 6.1: this feels a bit too detail-oriented. let's change the question to array([ 0, 5, 10, 15, 20 25 30]) and then the answers np.arange(0, 35, 5) and np.linspace(0, 30, 5). That way the linspace one is wrong because it should be 7 rather than 5.
[x] 6.2: again, a bit too detail-oriented. Reading this weirdly formatted stuff is not an important skill. Could you show them as they are actually displayed in a terminal?
[x] 7.3: the = is a bit misleading, instead say Given an array with a shape of (2,3,4,2)
[x] 8.1: Can you move this transpose one to the end? I got confused and looked at the transposed one when answering the other questions.
[x] 8.3: I selected the first one and the error message was a bit sad: "Incorrect. It may be a good idea to read over the slides." Can you change it to something more uplifting and helpful, e.g. about the last one being exclusive?
[x] 10: let's tell them they will need to use reshape
[x] 11.2: "not that" -> "note that" ;; should we mention somewhere the distinction between np.nan vs. None? Basically, since numpy arrays all need to be the same type, and since np.nan is considered a number, you can have a numerical numpy array with some of the values as np.nan (a "number") but it couldn't have a None in there since that has type NoneType.

Continuing after refreshing

[x] 1.4: "took" -> "rode his bike";; can we show head() before info()? I find head() really gets me oriented on a new dataset.
[x] 11.11: period before Perhaps;; why the rounding? ;; also maybe let's start with distance=0 so it's easier to see clearly in the printout, and then show the mean as another possibility. UPDATE: ah you have this. ok let's just switch the order then so 0 is first. ;; should we take out bfill and ffill? they are appropriate here because it's a time series dataset, but we don't have many of those, and it's not appropriate in most cases. or, at least mention that it only really makes sense in those cases, as opposed to all the other datasets like cereal etc.
[x] 12.1: contains -> contain;;
[x] 12.2: name -> named
[x] 14: uncomment the second line of code, they will need that to print out the info the first time;; name Wealth -> named Wealth;; consider swapping 14 and 15 since 14 is harder?
[x] 16.3: I don't think they can understand this yet - or did they already learn str.split() earlier? Maybe make a note that they don't need to understand all this code in detail because it's just a demonstration of how tricky this is going to be? But I actually really like the approach- it's fun to show how painful this is.
[x] 16.7: need a period after "factor";; put in some sort of marker like "Ok, so we really don't want to do it this way, right?". It's good to play up the humour here a bit.
[x] 16.8: lower case d for datetime;; here and later it incorrectly says parse_date instead of parse_dates
[x] Starting at 16.10 we have a bunch of "how to" slides which are a bit of a grind to get through. Let's first motivate them for why this is useful. Once we introduce parse_dates, can we do some fun stuff with it right away? Presumably this fun stuff is later and can be moved - let's see.
[x] 16.13: I would first show the output of new_cycling['Date'].dt.day_name() for clarity before combining it with assign
[x] 16.14: omiiting -> omitting
[x] 16.16: month_name -> month name;; there's some inconsistency here, is it without the dt. or without the ts. ?
[x] 16.17: now we get to something that takes advantage of parse_dates, that may be better to put earlier. Even before diff think sorting is already a "killer app". Can you explicitly show that if the datetimes are just strings and you do sort_values() that it sorts incorrectly? this would be a great motivator. maybe that would be sufficient to build up interest before the "grind" of all this syntax I mentioned.
[x] 16.18: summarize in words what they are seeing, e.g. "As you can see, there was a 13 hour and 38 minute gap between Tom's second and third [or whatever] bike rides. Wow - that's a long work day!" (assuming they're on the same day, maybe they are different days - needs verification).
[ ] 16.19: this is some great stuff here. Maybe put a teaser earlier on like - by the end of this slide deck we'll answer the question of what was Tom's longest time between rides". Also, a good question for later, assignment or the content, could be to find Tom's longest workday. That involves checking that the two rides were actually on the same day - naively you could just take every 2nd element in the diff, though if one ride was messing that would get all messaged up. The safest would be to groupby the date - is that possible? Not essential but seems like a fun extension. For example it could be a news exercise here after Ex 18.
[x] 18: so do they definitely know how to use round? Also, is there possible confusion between round and np.round?
[x] 19.2: remove comma before "is";; "scratch the tip of the iceberg" see here. let's go with "scratch the surface". ;; maybe we can save the last sentence for later, I was envisioning that we'd do raw strings first before getting to pandas (so move to 19.5?)
[x] 19.3: I guess technically these are methods rather than functions (not that I care much personally, but hey why not be precise);;
[x] You've done a great job reusing the cycling dataset for these 3 different topics.
[x] 19.6: uppercase -> upper case;;
[x] 19.8: before that I think they need a refresher on base string concatenation, e.g. 'hi' + 'hello' gives 'hihello';; maybe we could pick a more authentic example here, like putting in the distance, so it says, say, "Afternoon Ride 12.62 km" or something? Or even just a Distance_str column that says "12.62 km" ? the cycle_tripled feels particularly contrived.
[x] 19.9: "Two" -> "A";; "And .title() that capitalized every word in a string:" -> "Another is .title(), which capitalizes the first letter of every word in a string:"
[x] 19.10: this is a good opportunity to put in a motivating example. Like you could try to search for when the comment is "Rain" but then one time it's "Rain " and got missed in the search, or something. This is the problem we want to solve with strip() and it happens all the time in free text entry.
[x] 19.11: I'd make it more clear that stripping whitespace is the default but one can also strip for other characters. Maybe a more realistic example here would be punctuation, like "!" or "." etc.
[x] 22: Multiple T's

Will do the last section (23-27) later today.

mgelbart commented 4 years ago

[x] 23.2: "on" -> "of" ;; change comma to semicolon before "however" (and again in 23.3);; "I want to" -> "we'll" ;; "days" -> "adventures"
[x] 23.3: "also" -> "always" ? ;; why is it head(9)? and then you say the first 10 rows?
[x] 23.5: looking back, this feels a bit un-useful since the contains will find it anyway. I wonder if we can make this a bit more authentic. I'm assuming Tom wouldn't mind if you change the dataset a bit to suit our needs. For example, perhaps we could have one comment be "Thankfully not raining today!" and so we could explain that just using contains would make a mistake here. So instead our plan is to be conservative and only check if the comment == "rain" (also imperfect, but a conservative estimate), but then we can have one with a whitespace issue of "rain " and then we can strip it out?
[x] 23.7: remove comma after Tom;; this is a fun calculation to perform, but it's going to be tricky, let's see how it unfolds...
[x] 23.8: Capital "L" for "let's"
[x] 23.10-12 maybe it would be sufficient to link to the documentation instead?
[x] You mentioned you're not that happy with this. I think the content is good, it's just short and doesn't pack the punch of a typical slide deck. What are your thoughts on merging this so we only have one slide deck in strings?
[x] 25: any reason why the two dataframes show up in different formats? Maybe we could standardize on one format?
[x] 26: something is wrong here. the second code block has import pandas as pd twice and also switched from the lego to canucks. can you fix this up and then I'll take another look?
[x] 27: Overall, I feel we're still lacking a compelling use case for numpy - like, why are they learning this? I guess Tom influenced us to put it in, but I wish we had a good example. I guess that can go on our wish list.

Overall, nice job - only one more to go! 📈

hfboyce commented 4 years ago

First off You had to put up with my very rough raft for those first few slide soo I feel terrible for that.

5.15 I think there's an analogy to pandas filtering syntax that we could make here? If so we should.

Ok so technically there is but we taught them not to replace values in this manner and instead use .loc[] so should I mention that this ?

11.2: "not that" -> "note that" ;; should we mention somewhere the distinction between np.nan vs. None? Basically, since numpy arrays all need to be the same type, and since np.nan is considered a number, you can have a numerical numpy array with some of the values as np.nan (a "number") but it couldn't have a None in there since that has type NoneType.

I teach this to them in module 4. actually. Should I bring it up again? "Remember how in module 4 we explained ...."

11.11 should we take out bfill and ffill? they are appropriate here because it's a time series dataset, but we don't have many of those, and it's not appropriate in most cases. or, at least mention that it only really makes sense in those cases, as opposed to all the other datasets like cereal etc.

I do mention this about bfill and ffill.

16.3: I don't think they can understand this yet - or did they already learn str.split() earlier? Maybe make a note that they don't need to understand all this code in detail because it's just a demonstration of how tricky this is going to be? But I actually really like the approach- it's fun to show how painful this is.

They did learn about str.split() so non of this code should be new to them

Starting at 16.10 we have a bunch of "how to" slides which are a bit of a grind to get through. Let's first motivate them for why this is useful. Once we introduce parse_dates, can we do some fun stuff with it right away? Presumably this fun stuff is later and can be moved - let's see.

Can you see if this is better?

16.17: now we get to something that takes advantage of parse_dates, that may be better to put earlier. Even before diff think sorting is already a "killer app". Can you explicitly show that if the datetimes are just strings and you do sort_values() that it sorts incorrectly? this would be a great motivator. maybe that would be sufficient to build up interest before the "grind" of all this syntax I mentioned.

I want to show you what I did here and see if it's enough. I had to change the data a bit. I also don't exactly know what you mean by "Killer app"

18: so do they definitely know how to use round? Also, is there possible confusion between round and np.round?

So by this point we've use round and they have seen it. In this example we are using Python's built-in round() which they definitely have encounted. I dont really want to bring in the idea of all the different rounds because then we would have to explain np.round pd.round and the regular round which seems to make matters extra confusing.

23.5: looking back, this feels a bit un-useful since the contains will find it anyway. I wonder if we can make this a bit more authentic. I'm assuming Tom wouldn't mind if you change the dataset a bit to suit our needs. For example, perhaps we could have one comment be "Thankfully not raining today!" and so we could explain that just using contains would make a mistake here. So instead our plan is to be conservative and only check if the comment == "rain" (also imperfect, but a conservative estimate), but then we can have one with a whitespace issue of "rain " and then we can strip it out?

I'm note sure if I understand what you want me to do here.

23.7: this is a fun calculation to perform, but it's going to be tricky, let's see how it unfolds...

I didn't add the calculation, but I could. Should I?

26: something is wrong here. the second code block has import pandas as pd twice and also switched from the lego to canucks. can you fix this up and then I'll take another look?

Fixed!!

27: Overall, I feel we're still lacking a compelling use case for numpy - like, why are they learning this? I guess Tom influenced us to put it in, but I wish we had a good example. I guess that can go on our wish list.

Add to the wish list, or should we work on this ?

hfboyce commented 4 years ago

@mgelbart So I changed around the NumPy sections and I wanted you to have a look over it to see it this flows better. I also have some other questions but we can address them on Monday.

hfboyce commented 4 years ago

@mgelbart Can you please look over Exercise 26 before I give it to Elijah? Thank you (Also please look over Exercise 1 and 5 again which are the NumPy sections.) Also maybe Exercise 11(quickly?)

mgelbart commented 4 years ago

2nd review of Exercises 1, 5, 11, 26

[x] 1.9: let's take a moment to emphasize that this is way more convenient than a list
[x] 1.16: the header is confusing, change to Numpy Constants and make a title on the next slide = Numpy Functions?
[x] 5.2: I think this could lead to a misconception that 1D vs 2D has something to do with parentheses vs. square brackets. Can you clear this up?
[x] 5.3: probably worth mentioning that zeros takes a tuple whereas rand takes the integers directly - it's easily confusing.
[x] 5.4: don't need question mar
[x] 5.8: shape() -> shape
[x] 5.11: following one of our general principles, should show the "before" and "after" on the same slide

Will do 11 and 26 after my next meeting.

mgelbart commented 4 years ago

[x] 11.5: output says 31 but text says 30 (and 33-30=3). This problem comes back on 11.7 and 11.9.
[x] 11.6-7: this was a bit confusing because we changed two things from 11.6 to 11.7. Best to change one thing at a time. The two things are: (1) going from the Distance column to any and (2) using this to filter. It would be less confusing if you used the distance column in both, or if you used any in both, or if you added an intermediate slide in between that introduced any and then used it to filter.
[x] 11.9: not sure if this is the place to mention it, but another reasonable thing to do is drop a column (rather than row) if the column is like 99% NA
[x] 11.10: semicolon or period before "however"
[x] 11.12: "will" -> "with";; hmm, I actually don't see the 0 in the output, is there a bug here? same problem on the next few slides I think. And then by 11.15 something seems to be happening, but on index 1 rather than index 2
[x] 26: the instruction of the first task is a bit confusing, maybe make it more explicit by asking them to overwrite the lego object;; for the second instruction, we should say that they are looking in the name column (and same for the 6th instruction);; fourth instruction, i think the W should be lower case, right?

UBC-MDS / programming-in-python-for-data-science

Feedback Module 8 #51

These comments are from the old version

Continuing after refreshing