NYUDataBootcamp / Book

Textbook to accompany class
Creative Commons Attribution 4.0 International
7 stars 4 forks source link

Pandas chapters rework #18

Closed cc7768 closed 7 years ago

cc7768 commented 7 years ago

Should think about doing some additional organization of the pandas chapters:

Two examples of this would be:

cc7768 commented 7 years ago

Started rewrite of these chapters. I'm thinking of simplifying the pandas-input chapter -- I don't know if there is any reason we should introduce reading data from the internet and reading data from your computer as separate things. They use the same function.

Thoughts?

sglyon commented 7 years ago

I already went through pandas-input.md for a re-write just before I taught it -- but by all means do another pass at it and make things even better!

I am fairly strongly against separating reading in files from the internet and your computer. Doing it in this way allows us to separate the issues regarding what the pd.read_* functions do from understanding how modern filesystems work. I've found that both are significant issues for students to understand, so breaking them up in this way is quite helpful.

sglyon commented 7 years ago

Oops: I meant to say I'm against combining those sections

cc7768 commented 7 years ago

What are your thoughts on reorganizing it? To me it feels natural to introduce a DataFrame and its methods prior to describing the read_* functions. My plan would be to do something along the lines of

  1. First look at dataframes (includes some of the properties etc...)
  2. Operating on dataframes (includes creating new variables etc...)
  3. Dataframe methods (cover a few of the methods -- Mostly just to encourage them to explore the . + <T> stuff)
  4. Data Input
    • Reading from internet
    • Reading from computer
    • APIs
  5. Examples
cc7768 commented 7 years ago

It might even make sense to break that into 1, 2, and 3 into a pandas-intro chapter and then do 4 and 5 in a pandas-input chapter -- I like short(ish) chapters and I feel like this one is a bit of a mouthful.

sglyon commented 7 years ago

That's a great question.

I also felt it was a bit strange to have the read methods before the data frame properties and working with variables section. However, the argument in favor of how things are now is that in order to talk about those methods you need a DataFrame and by far the most common way studnets will get DataFrames in their code is by using the read_* methods.

If you would like to reorganize and don't think it would be too much effort to implement, maybe you could do it and open it as a PR. I could take a look at the finished product and give a more specific review at that point. I don't want to create extra work, but this approach is the "least risky" in my opinion. (Also, I suspect that I will like the shorter chapters approach anyway, so ex ante I anticipate merging the PR)

cc7768 commented 7 years ago

K. I'll work up some changes and update you (with a PR) once I'm ready.

cc7768 commented 7 years ago

Addressed by https://github.com/NYUDataBootcamp/Book/pull/22#pullrequestreview-5888600