UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
48 stars 53 forks source link

Data Science: A First Introduction

This is the source for the Data Science: A First Introduction textbook.

The book is available online at: https://datasciencebook.ca/

© 2020 Tiffany A. Timbers, Trevor Campbell, Melissa Lee

For the python version of the textbook, please visit https://python.datasciencebook.ca or the github repository at https://github.com/ubc-dsci/introduction-to-datascience-python.

License Information

This textbook is offered under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.

Development

Setup

Building the book requires Docker (instructors here: https://docs.docker.com/get-docker/)

Build locally

You must have at least 8GB of RAM (and ideally more like 16GB RAM) to build the book.

You can build the HTML version of the book on your own machine by running

./build_html.sh

in the root directory of this repository. The book can be viewed in your browser by opening the docs/index.html file.

You can build the PDF version of the book on your own machine by running

./build_pdf.sh

in the root directory of this repository. The book can be viewed in a PDF reader by opening docs/_main.pdf.

Working with RStudio (HTML only)

If you want to edit the source material and build the book using RStudio, navigate to the repository root and run

docker-compose up -d

to start up the docker container. Then open a web browser and type http://localhost:8787/. For the username enter rstudio, and for the password enter password. At any point you can render the book by running the following R code in the R console:

bookdown::render_book('index.Rmd', 'bookdown::gitbook')

When you are done working, make sure to type docker-compose down to shut down the container.

Contributing

Primary development in this repository happens on the main branch. If you want to contribute to the book, please branch off of main and make a pull request into main. You cannot commit directly to main.

The production branch contains the source material corresponding to the current publicly-viewable version of the book website.

The gh-pages branch serves the current book website at https://datasciencebook.ca.

Workflows

Book deployment

You can update the live, publicly viewable HTML book by making changes to the source/ folder in the production branch (e.g. by merging main into production). GitHub will trigger a rebuild of the public HTML site, and store the built book in the root folder of the gh-pages branch.

main deploy previews

Any commit to source/** on the main branch (from a merged PR) will trigger a rebuild of the development preview site served at https://datasciencebook.ca/dev. The built preview book will be stored in the dev/ folder on the gh-pages branch.

PR deploy previews

Any PR to source/ will trigger a build of a PR preview site at https://datasciencebook.ca/pull###, where ### is the number of the pull request. The built preview book will be stored in the pull###/ folder on the gh-pages branch.

Build environment updates

Any PR to Dockerfile will trigger a rebuild of the docker image, push it to DockerHub, and update the image tags in the build_html.sh and build_pdf.sh scripts on the PR automatically. This new build environment will be used for the PR deploy preview mentioned above.

Style Guide

General

Code blocks

Section headings

Choose an appropriate table of contents depth via (example has depth 2 below, which is a good default)

bookdown::gitbook:
    toc_depth: 2

Learning objectives

Captions

Equations

Figures

Tables

Note boxes

Bibliography

Naming conventions

Punctuation

Common typos to check for

Use American spelling

Generally the book uses American spelling. Some common British vs American and Canadian vs American gotchas:

Whitespace

We need a line of whitespace before and after code fences (code surrounded by three backticks above and below). This is for readability, and it is essential for figure captions.

PDF Output

These are absolute last steps when rendering the PDF output:

HTML Output