UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
50 stars 56 forks source link

Review: Global/big picture content revisions #95

Closed ttimbers closed 2 years ago

ttimbers commented 3 years ago

@trevorcampbell @leem44 - let's list the global/big picture content revisions asked for in review here. When adding please do not duplicate things, just edit the list noting that it was asked for by X number of reviewers.

leem44 commented 3 years ago

Reviewer E

reservations

additions:

omissions:

examples

length

target audience

leem44 commented 3 years ago

Reviewer B

trevorcampbell commented 3 years ago

Reviewer D

trevorcampbell commented 3 years ago

Reviewer A

trevorcampbell commented 3 years ago

Edit this comment directly to summarize / synthesize the major (important) comments across reviewers. If there's already the same comment below, add the reviewer to the parenthetical list at the beginning.

Synthesis:

ttimbers commented 3 years ago

Reviewer F

ttimbers commented 3 years ago

Reviewer C

ttimbers commented 3 years ago

Action items

Revision of synthesis into action items. We might want to pull these off into individual issues we can close as we address them, and assign folks to them.

Major

  1. Revise chapter 1 (introduction): Expand the brief introduction paragraph in chapter 1 to be more clear about what the book covers. Specifically:

    • tell people how to read the book (might want to read the system setup and version control chapters first)
    • add data science workflow diagram to beginning of chapter 1, & expand text to make it clearer what is going on
    • move select & filter to wrangling (chapter 3) and make sure this doesn't negatively affect chapter 2
    • move Jupyter content to System setup/Setting up your computer chapter

    This addresses comments by Rev D, Rev A & C. Need to rebut D's ask for a whole new chapter here. MAJOR

  2. Move version control chapter to the end of the book and revise chapter to be more conceptual. Add more conceptual content to the vc chapter, and diagrams (like these), and move the screenshots to a screen cast with a stable link. Might want to consider doing both a Jupyter Git Extension demo, and an RStudio one. This addresses comments made by Rev C & F. MAJOR

  3. Simplify and better explain data sets. Where we can, provide more information/context about the data sets (maybe in a call out box or something?). Also, make it clear where things have been simplified and why (so we can focus on the data science method we are teaching). At a minimum, we need to explicitly state that data science cannot be done without a deep understanding of data and domain, and that we are approaching things the way we are to teach data science, and IRL data science should not be done without a domain expert, or alternatively, it is common to practice data science in your domain of expertise.* Go through each chapter and find where we can just have one data set. Idea: see if we can have chapter 2 only use canlang data sets (might not work for web scraping, but maybe there's a more related data set? Note: Chapter 4 needs multiple data sets by the way we have written it. Question: Think about clustering chapter - can we use one that we are already using? This addresses comments made by Rev C & E, but we do need to also generate a rebuttal here stating why we have chosen a rich set of data sets for this book. MAJOR

  4. Draft a putting it all together chapter. Create a putting it all together chapter, where we demonstrate an entire DS workflow, from reading data , to EDA, to modelling, and communicating the results. We can build off a project Tiffany has created for MDS: https://github.com/ttimbers/breast_cancer_predictor. At a minimum we do this for a classification example, at a maximum we do this also for all modelling methods in the book. Or some intermediate goal. MAJOR

  5. Move Jupyter-related content to System-setup chapter: Rename "Moving to your own machine" to "System setup" (or something related? Like "Setting up your computer?"?) and move any Jupyter-related content there. We can then link to it from other chapters if needed. Bonus: can we also explain how to get setup and use Rmd with RStudio so our book can support both major DS literate code document platforms? Or at a minimum link out to other good resources on this (risk: they don't come back to us...). UI (how to use Jupyter & Rmd) stuff becomes videos with stable links. Make sure videos are general enough for the book, and not specific for this course. This addresses comments made by Rev C, E. MAJOR

  6. Revise supervised learning chapters.

    • Add high-level sections to 6 (process - train & evaluate, 6 will cover train, 7 covers evaluate) & 8 (acknowledge that this is a repeat of 6 & 7 with a new flavour)
    • add a wrap-up/summary at the end of 9 that brings everything together in context (we could introduce the term supervised learning, and hint that the next chapter covers a new type of analysis for a different problem, that is called unsupervised learning)
    • add some guidance on feature engineering at the end of the regression chapter
    • list other common algorithms in each chapter and even briefly mention when alternatives might be preferred

    This will address comments made by Rev C & E. MAJOR

Minor/major

  1. Fix/improve index. We need a robust index for this book. Check whether this can be autogenerated by bookdown? Talk to Laura (CRC Press) for help on this if we need to? Also, once we create the index, we want to create a glossary of the main terms and functions. Let's consider using the glossario R package for this, and borrowing from the Carpentries English glossary? This addresses comments made by Rev E. MINOR/MAJOR?

  2. Ensure book is written for intended audience. Read through reviewer C's annotations and address highlighted parts where book appears to be written for other instructors rather than for students. This will address comments made by Rev C. MINOR/MAJOR?

  3. Summaries of where we are going. Read through the book, and ensure there is a summary of where we are going for each walkthrough example in the book (especially 6, 7, 8). This will address comments made by Rev D. MINOR/MAJOR

  4. Clarify examples from the the universe of possibilities. Read through the book, and clarify where what we are discussing is meant to illustrate an example versus the universe of possibilities -- worth being more specific about things that are just high-level examples (e.g. KNN versus all classification algorithms). For example, make it clear what parts of chapter 7 are relevant to classification in general, and which are relevant to just k-nn MINOR/MAJOR

Minor

  1. Stable domain and links. Get a domain, that we can use to come up with stable links for linking to videos and worksheets from the textbook. MINOR

  2. Exercises/worksheets. Create repository for just worksheets to be associated with the textbook. (Bonus 1: Perhaps we can use Binder or a Public JupyterHub to make them interactive? Bonus 2: Add GitHub Actions to the repo to use Jupytext to autogenerate Rmd's of the worksheets for folks who'd prefer to work with Rmd instead of Jupyter?) Point to the relevant worksheet at the end of each chapter using a stable link. Remember, when we point to the worksheets, to also point to the system setup chapter so they can first follow the instructions of setting up Jupyter on their own machine. (+1 Reviewer E and B) MINOR

  3. Improve additional resources sections. Add a few sentences to give context to each additional resource we share. This means going just beyond annotating them (we should also do that), but also explain what topic they should focus on next. Rebut: We will not add a where to go from here chapter, asked by reviewer ?, but will instead do better job at the end of each chapter, on a topic-by-topic basis. MINOR

  4. Update to the newest dplyr and ggplot2. Check most recent updates to dplyr and ggplot2 and make sure we are using the most up to date syntax in the books. If this needs to change, also change in the worksheets. This addresses comments made by Rev A & E. MINOR

  5. Ensure we just teach one way to do things. Read through the book and check that we only teach one way to do things (although we can acknowledge there are many ways to do things): Specifically, find where we have shown alternatives and make sure we are clear the way we want them to do it with tidyverse. This addresses comments made by Rev C. MINOR

  6. Improve regression introduction. Lead into regression more gentle and rely less on past knowledge about classification (8.3 is a bit bizarre, should combine with 8.4). This will address comments made by Rev C. MINOR

  7. Add vector data types explanations. Add base R vector types (logical, integer, double, character) introduction and explanation to section 3.3.2 ("What is a vector?") in chapter 3. Add factor vector type explanation in chapter 4 (visualization chapter), when we need it. This will address comments made by Rev C & E. MINOR

  8. Forward references & unexplained concepts. Go through book and look for forward references or unexplained concepts. MINOR

  9. Make it clear web scraping is optional/add API's. Add a note to make it clear that web scraping is optional. Fix the wrong definition of web scraping. Add a subsection on web API's (we could use this one: https://cran.r-project.org/web/packages/cancensus/vignettes/cancensus.html). MINOR

Rebuttals to write

These things we are purely rebutting, not changing.

Still need to decide what to do for these:

trevorcampbell commented 2 years ago

@ttimbers @leem44 this is done now, right?

trevorcampbell commented 2 years ago

Closing; we can re-open if needed.