UBC-DSCI / introduction-to-datascience-python

Open Source Textbook for DSCI100: Introduction to Data Science in Python
https://python.datasciencebook.ca
Other
13 stars 9 forks source link

Suggestions for Ch2 - Reading data #38

Closed joelostblom closed 1 year ago

joelostblom commented 2 years ago
trevorcampbell commented 1 year ago
trevorcampbell commented 1 year ago

@joelostblom after a brief skim, ibis looks super neat. I am kind of tempted to switch to that, given some more investigation. And maybe mention to students the option to send raw SQL to the DB via pd.read_sql in a note box or something like that.

I will look at scrapy vs beautifulsoup shortly

joelostblom commented 1 year ago

Yeah, ibis really looks impressive. My hesitation is that I don't know anyone who uses it, so I don't have good insight into corner cases or real life experience/feedback.

trevorcampbell commented 1 year ago

I just played a bit with ibis now. It's way easier to use and more natural than sqlalchemy. I would be worried if we were doing advanced stuff, but since our course just does very simple select/filter/execute, I am going to switch us over.

Thanks for the suggestion!

trevorcampbell commented 1 year ago

I also am commenting out the web scraping and API stuff for this round, since we have more important things to handle for Jan. Issue opened to reintroduce it later #64

joelostblom commented 1 year ago

look through read_csv (and to_csv) documentation and see if there are any other useful arguments to discuss (e.g. relating to indices)

Just adding to this, the ones I used the most often that we have not covered are skipinitialspace and parse_dates. I think chunksize could be useful too. Having that said, I am unsure if they fit in this intro chapter (and maybe not at all in the book), or could maybe be part of the data cleaning chapter (at least the first two)?