dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.32k stars 806 forks source link

Issue for style questions/consistency #1

Closed juliasilge closed 8 years ago

juliasilge commented 8 years ago

I'm hoping that we can use this issue for questions about consistency in style for the whole book.

My first one: should we show or hide code to make plots in the book? I show the code on my blog because I know some people are interested specifically in that and always ask questions, but I lean toward hiding in the book, since it is not about plotting at all. Thoughts?

dgrtwo commented 8 years ago

I think we should show the code for at least some plots, since I do think the ease of visualization with ggplot2 is a major benefit for the tidy text workflow. (It's something that can't be done directly from other packages)

But two ways we can reduce it:

  1. We can show code for some plots but not others: for example, for the first time we show a particular kind of plot. This requires some judgment and I'm not sure about it.
  2. We should find ways to hide the theme customization. The colors in your plots are awesome (I am so much lazier and rely on defaults!) but use a lot of code. What if we defined those themes in a hidden chunk beforehand? That way if someone tried the code just from the book, they'd get a decent default plot.
juliasilge commented 8 years ago

Yeah, maybe we can use this to make consistently themed plots in the whole book, actually. Just define/hide some nice theme customization for the whole thing.

juliasilge commented 8 years ago

Another question: we can put

new_session: yes

at the beginning of _bookdown.yml if we want, and that forces each .Rmd to be self-contained, I think? Then in each chapter, we would need to repeat each library(tidytext) and so forth, and also right now in the first few chapters there are dependencies on the actual data frames themselves (tidy_books, etc).

On the one hand, things might get repetitive. On the other hand, if the point is to be helpful and tutorial-like, it seems rather unhelpful to send people searching back through past chapters to find out where something was defined. I lean toward setting the new session variable to yes?

dgrtwo commented 8 years ago

I think that's a good idea but it means we'd have to define the aforementioned ggplot2 theme each time- even if hidden that's a lot of duplication. Perhaps we should have a setup.R that defines that, and gets called in a hidden chunk at the start of each chapter.

I think it also currently requires chunk names to be unique even among chapters, which is annoying, so if this fixes that it would be great.

juliasilge commented 8 years ago

Title or sentence case for chapter and section headings? I don't have a preference but we need to be consistent and the sooner we start editing these things the more likely we are to get them all right.

(Also, I don't think we need to go lower than section headings; I think we should change everything to ## that is currently ###.)

dgrtwo commented 8 years ago

OK, few suggested conventions, please let me know if you disagree:

I'm not sure about ###. I think I do find subsections useful and other books like R4DS use them (see here); and we may find that there are too many sections in the table of contents otherwise

juliasilge commented 8 years ago

These all sound good to me. I've already started giving chunks names as I edit; I'll keep going with that.

dgrtwo commented 8 years ago

I used to use four spaces (broom package is an artifact of that time) but moved to two about 1.5 years ago. I've been surprised at how readable it is, and it does help avoid hitting 80 chars!

My best argument is that I think we otherwise follow all of Hadley's style guide and this would get us to 100% 😀

juliasilge commented 8 years ago

We might want to go through and change all the factor handling to use forcats functions; they are much nicer.

juliasilge commented 8 years ago

A couple of thoughts/questions...