This is a course project for ZOOL710 Data Science in R for Biologists
A template file and folder structure for a data analysis project/paper done with R/Quarto/Github. The structure promotes a data science pipeline for reproducibility in research.
Figure: The Data Science Pipeline from RD Peng
2023-02-16 First release with Data and Code folders with materials for data cleaning and initial exploration.
2023-03-03 Changes made to the processingcode.R, processingfile_v1.qmd, and processingfile_v2.qmd.
2023-03-31 Updates were made to Project 1 which included the processingcode.R, processingfile_v1.qmd, and processingfile_v2.qmd. Project 2 added the Anaylsis_Penguins.R, Analysis_code.qmd, and Analysis_code.html files.
This template was cloned and modified from https://github.com/andreashandel/dataanalysis-template
This template lays out a data analysis project and report writing using R, Quarto, Github and a reference manager for bibtex. A plain text editor is also necessary, and wordprocessing software to open .docx if you wish to use that format (e.g. MS Word, MacOS Pages, or LibreOffice).
For more R packages supporting reproducible research check out the taskview https://cran.r-project.org/web/views/ReproducibleResearch.html
The template supports the notion that there should only be one copy of code and any outputs. Any time the intellectual content needs to be reused, it should be referred to or linked from the one copy. This way we can prevent different copies of the same content from accidentally diverging, and it is easier to maintain the project.
This template also uses the convention that Folder names begin with a capital letter.
Data
folder and any subfolders.Code
folder or subfolders.Results
folder or subfolders.Products
subfolders.README.md
files in those folders for some more information.All of the folders contain template files that should filled with some content required for your analysis. Look inside each folder for template files that provide examples of the types of content that go there.
The template files also demonstrate how information is linked together across folders.
Please see the README.md
files in each folder for more details.
Raw_data
folder. Processing_code
folder contains several files that load the raw data, perform a bit of cleaning, and save the result in the Processed_data
folder. Analysis_code
folder contains several files that load the processed data, do an exploratory analysis, and fit a simple model. These files produce figures and some numeric output (tables), which are saved to the results
folder.Products
folder contains an example bibtex
and CSL style files for references. Those files are used by the example manuscript and slides.Manuscript
folder contains a template for a report written as Quarto file. There is also a sub-folder containing an example template for a supplementary material file as is common in scientific articles these days.slides
folder contains a basic example of slides made with Quarto. This is a Github template repository. The best way to get it and start using it is by following these steps.
Once you got the repository, you can check out the examples by executing them in order.
Starting with Project 1, look in Code>Processing_Code:
Starting Project 2, look in the Code>Analysis_code:
bibtex
file and format them according to the CSL style.