This is the source for the Data Science: A First Introduction textbook.
The book is available online at: https://datasciencebook.ca/
© 2020 Tiffany A. Timbers, Trevor Campbell, Melissa Lee
For the python version of the textbook, please visit https://python.datasciencebook.ca or the github repository at https://github.com/ubc-dsci/introduction-to-datascience-python.
This textbook is offered under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.
Building the book requires Docker (instructors here: https://docs.docker.com/get-docker/)
You must have at least 8GB of RAM (and ideally more like 16GB RAM) to build the book.
You can build the HTML version of the book on your own machine by running
./build_html.sh
in the root directory of this repository. The book can be viewed in your browser by opening the docs/index.html
file.
You can build the PDF version of the book on your own machine by running
./build_pdf.sh
in the root directory of this repository. The book can be viewed in a PDF reader by opening docs/_main.pdf
.
If you want to edit the source material and build the book using RStudio, navigate to the repository root and run
docker-compose up -d
to start up the docker container. Then open a web browser and type http://localhost:8787/.
For the username enter rstudio
, and for the password enter password
.
At any point you can render the book by running the following R code in the R console:
bookdown::render_book('index.Rmd', 'bookdown::gitbook')
When you are done working, make sure to type docker-compose down
to shut down the container.
Primary development in this repository happens on the main
branch. If you want to contribute to the book,
please branch off of main
and make a pull request into main
. You cannot commit directly to main
.
The production
branch contains the source material corresponding to the current publicly-viewable version of the book website.
The gh-pages
branch serves the current book website at https://datasciencebook.ca.
You can update the live, publicly viewable HTML book by making changes to the source/
folder in the production
branch (e.g. by merging main
into production
).
GitHub will trigger a rebuild of the public HTML site, and store the built book in the root folder of the gh-pages
branch.
main
deploy previewsAny commit to source/**
on the main
branch (from a merged PR) will trigger a rebuild of the development preview site served at https://datasciencebook.ca/dev
.
The built preview book will be stored in the dev/
folder on the gh-pages
branch.
Any PR to source/
will trigger a build of a PR preview site at https://datasciencebook.ca/pull###
, where ###
is the number of the pull request.
The built preview book will be stored in the pull###/
folder on the gh-pages
branch.
Any PR to Dockerfile
will trigger a rebuild of the docker image, push it to DockerHub, and update the image tags in the build_html.sh
and build_pdf.sh
scripts on the PR automatically.
This new build environment will be used for the PR deploy preview mentioned above.
read_csv
not read_csv()
)**bolding**
to typeset it (but only the first introduction of the term),
in the text, I should do
something like "here is some text about the comma (,
)". Or for <-
, we should do "something like this assignment operator (<-
)".
There are likely exceptions to this rule though.##-[name with only alphanumeric + hyphens]
where
the ##
is the 2-digit chapter number, e.g. 03-test-name
for a label test-name
in chapter 3```r code ```
not
``` code ``` (similar for `html` where needed)
|>
pipe, not %>%
grid = 10
; actually specify the values using seq
or c(...)
head(dataframe)
; just use dataframe
to printset.seed
once at the beginning of each chapter"double quotes"
for strings, not 'single quotes'
styler
(although must obey the 80ch limit)slice
, slice_min
, slice_max
(not top_n
)pull(colname)
, don't select
first{-}
is used wherever unnumbered headings are requiredChoose an appropriate table of contents depth via (example has depth 2 below, which is a good default)
bookdown::gitbook:
toc_depth: 2
If you have special characters (particularly underscores, quotation marks, plus signs, other LaTeX math symbols) make sure to separate the caption out of the code chunk like so
(ref:blah)
\`\`\`
{r blah, other_options}
code here
\`\`\`
out.width="70%"
),
for plots we create in R use fig.width
and fig.height
.fig.align = "center"
fig.width=5, fig.height=3
(an exception are figs 1.7 & 1.8 so that we can read the axis labels)image_crop
)>
and start with Note:This kind of typesetting—which is awesome—is correct!
and Typesetting with spaces around em-dashes — which is bad — is not correct
\index
commands don't break punctuation spacing. E.g. This is an item \index{item}; it is good
will typeset with an erroneous space after item, i.e. This is an item ; it is good
Generally the book uses American spelling. Some common British vs American and Canadian vs American gotchas:
We need a line of whitespace before and after code fences (code surrounded by three backticks above and below). This is for readability, and it is essential for figure captions.
These are absolute last steps when rendering the PDF output:
\allowdisplaybreaks
helps)??
in the PDF (broken refs)[blah](url)
) will make
sense in the hardcopy book version (i.e. nothing like "click this"). Many links appear in the additional resources: make sure the
text-replacement of the URL contains enough information for someone to find the resource (without being able to click the link)??
)