UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
50 stars 56 forks source link

Review: Ch 4 (viz) #104

Closed leem44 closed 2 years ago

leem44 commented 3 years ago

Reviewer E:

leem44 commented 3 years ago

Reviewer B:

trevorcampbell commented 3 years ago

Reviewer D

trevorcampbell commented 3 years ago

Reviewer A

Trevor

ttimbers commented 3 years ago

Reviewer C

leem44 commented 3 years ago

Adding Tiffany's comment from #92

ttimbers commented 2 years ago

From Reviewer E

The section on saving plots seems long and unusually detailed (e.g. raster versus vector images), and I might consider removing. This is all interesting stuff, but it feels very in-depth compared to the higher level treatment of core data analysis and modeling principles. In particular, the book assumes students are working in notebooks where saving images is generally not needed.

I disagree with this comment and feedback and think we should keep this section. Will not address.

ttimbers commented 2 years ago

From Reviewer B

The absence of side-by-side boxplots in the text is a glaring one. Boxplots empower students greatly at they provide them with the means of not just comparing averages between groups, but distributions between groups. That being said, I do appreciate that they are very confusing at first and take a lot of ink to explain well.

I think boxplots are most useful when comparing many distributions - so many that you cannot plot the actual distributions in something like a jitter/strip plot or ridge plot. Also, we don't get to statistical methods that compare distributions. For both these reasons I believe this is out of scope for our book. Will not address.

ttimbers commented 2 years ago

From Reviewer D

I would love it if the section on Explaining the Visualization would be expanded to its own chapter and elaborated upon more

I think this is a good idea but beyond the scope of this revision. Will add to the version 2 issue as a possible future improvement.

ttimbers commented 2 years ago

From Reviewer E:

The theme() function is very complex and has many potential arguments. I would recommend either taking the time to explain how it works in more detail or removing it from the text. I worry showing one single example without explanation may lead students to a misunderstanding of what it is.

I will partially address by stating that the theme() function is very complex and has many potential arguments. And pointing interested readers to docs for this.

ttimbers commented 2 years ago

From Reviewer C:

Section 4.4.0.1: I strongly recommend including “don’t vs do” examples here - show people the bad plot, then show them the plot with that particular problem corrected. Without that, the general rules are difficult to interpret.

I feel like we do that for each of the examples - we start out with the defaults and then iterate to improve it. I get what they are saying about common mistakes people make, but I think that we have 1 viz chapter in a book, not a book on viz, so there is only so much we can do!

ttimbers commented 2 years ago

@trevorcampbell - Re: "Figure 4.16 on page 102 does not highlight the artifacts the same way the web version does... did we accidentally load the same image twice?"

No, it does not in the PDF - this was something we need to fix. It is not simple nor straightforward due the the svg. Will move this to the formatting pass.

Also this reviewer comments is in relation to the issues with this figure: p102: ypesetting: On this page we have top and bottom, this is likely a sizing issue. figures should be left/right per the caption

ttimbers commented 2 years ago

From Reviewer E:

In section 4.5.2, the question asked of the plot could easily be answered by a table. Is there a risk here of making students feel that they must plot everything? More broadly, should how to make a good table be covered in parallel as another aspect of data visualization?

I agree tables are useful, but visualizations are usually more effective at communication. For example, the landmasses bar plots are immediately easier to to see the biggest 7, and see how much bigger they are. This is definitely not as easy, or immediately apparent with the same data presented as a table. Rarely, do I think tables are more effective than visualizations... They are sometimes convenient, but rarely more effective in my opinion. I guess, they are more useful for precisely communicating values...

I think putting an emphasis on visualization in our book, over tables is the right move. A focus on tables is suitable in a DSCI comm course book I think.

I ended up putting a comment on why viz, and when tables might be a better choice.

ttimbers commented 2 years ago

From Reviewer C:

Fig 4.1: worth pointing out that the vertical axis doesn’t start at zero (which meany people regard as misleading)

My issue is that this is pretty nuanced (or advanced) for our audience. It's a complicated thing to explain and discuss. This is something they would learn in the DSCI minor third year course on data viz, not in DSCI 100 - the audience for which this book was written. Thus I think I am not going to point it out.

ttimbers commented 2 years ago

From Reviewers E & D:

I really like the introduction. The only thing I thought could be expanded on was the distinction between exploratory and explanatory visualization and how one might approach visualization as part of EDA versus visualizations to document a result

don’t fully agree that a visualization is intended to answer questions – often a visualization is intended to generate questions. The author should allow for the latter possibility in their explanations.

I do see where they are coming from. Viz is used in two contexts usually - exploratory data analysis (EDA) for the analyst during their analysis, and then data viz for communication to stakeholders/readers/etc. But we don't get into this detail of how the workflows differ here. Again, this is one chapter in an intro DS course, not a course on Data Viz.

Additionally, even if you are doing EDA you have exploratory Q's. So I am thinking of not addressing these two points and leaving the intro as it is.

ttimbers commented 2 years ago

Closed by #263