Yokohime / Chiyoko_Project

0 stars 0 forks source link

This is a course project for ZOOL710 Data Science in R for Biologists

Overview

A template file and folder structure for a data analysis project/paper done with R/Quarto/Github. The structure promotes a data science pipeline for reproducibility in research.

Data science pipeline Figure: The Data Science Pipeline from RD Peng

Projects

  1. For the first project, we are focusing on data cleaning, so we only have Data and Code folders.
  2. For the second project, we are analyzing the cleaned data, so have Data, Code, Results, and Products folders
  3. For the third project, please analyze your own data with data cleaning, analysis, and report preparation.

History

2023-02-16 First release with Data and Code folders with materials for data cleaning and initial exploration.

2023-03-03 Changes made to the processingcode.R, processingfile_v1.qmd, and processingfile_v2.qmd.

2023-03-31 Updates were made to Project 1 which included the processingcode.R, processingfile_v1.qmd, and processingfile_v2.qmd. Project 2 added the Anaylsis_Penguins.R, Analysis_code.qmd, and Analysis_code.html files.

Acknowledgement

This template was cloned and modified from https://github.com/andreashandel/dataanalysis-template

Software requirements

This template lays out a data analysis project and report writing using R, Quarto, Github and a reference manager for bibtex. A plain text editor is also necessary, and wordprocessing software to open .docx if you wish to use that format (e.g. MS Word, MacOS Pages, or LibreOffice).

For more R packages supporting reproducible research check out the taskview https://cran.r-project.org/web/views/ReproducibleResearch.html

Template structure

The template supports the notion that there should only be one copy of code and any outputs. Any time the intellectual content needs to be reused, it should be referred to or linked from the one copy. This way we can prevent different copies of the same content from accidentally diverging, and it is easier to maintain the project.

This template also uses the convention that Folder names begin with a capital letter.

Template content

All of the folders contain template files that should filled with some content required for your analysis. Look inside each folder for template files that provide examples of the types of content that go there.

The template files also demonstrate how information is linked together across folders.

Please see the README.md files in each folder for more details.

Getting started

This is a Github template repository. The best way to get it and start using it is by following these steps.

Once you got the repository, you can check out the examples by executing them in order.

Starting with Project 1, look in Code>Processing_Code:

  1. First run the processingcode.R, which will produce the processed data.
  2. Then you can run the processingfile_v1.qmd and/or processingfile_v2.qmd. Processingfile_v2 requires edits to be made inthe processingcode.R script

Starting Project 2, look in the Code>Analysis_code:

  1. Run the analysis scripts, which will take the processed data and produce some results.
  2. Then you can run the manuscript, poster and slides example files in any order. Those files pull in the generated results and display them. These files also pull in references from the bibtex file and format them according to the CSL style.