restructure repo with fetch/process structure

aaarcher-usgs commented 2 years ago

Changed working directory structure from:

_my_work_R/my_happyscript.R

to:

_fetch/src/my_happyscript.R (moved) fetch/out (empty directory) process/src (empty directory) process/out (empty directory)

github-learning-lab[bot] commented 2 years ago

Great, your PR is open! Let's do some more work before merging it. Now that your files are organized into phases, next you will add a commit to your pull request that makes changes to the code itself.

Background

In addition to phases, it is important to decompose high-level concepts (or existing scripts) into thoughtful functions and targets that form the building blocks of data processing pipelines. In addition to being the name of the pipeline package, a target is a noun we use to describe a tangible output of function, often a file or an R object, that we can use as an end-product (like a summary map), or as an input into another function.

We strive to create functions that declare their clear purpose (combining good function naming with thoughtful arguments/inputs is helpful for this) and are designed for re-use when appropriate. When writing pipelines functions, look for areas of re-usable operations or places where simple dividing lines can be drawn between different parts of data access, processing, modeling/analysis, and visualization. We use the high-level "phases" to divide the major concepts, but the way we scope functions is an additional subdivide. It is a best practice to have a function do a single thing, so instead of creating two plots and a table, it might be better to use one function to generate a table, which is then used as input to another function to create a plot. There are exceptions to this pattern (a 1:1 function-to-target pairing) that we'll get into later.

:keyboard: Activity: Modify existing code to create functions that generate plot, table, and log file outputs

We started you off with an example script in the my_work_R folder, which hopefully lives in either 1_fetch/src or 2_process/src by now. This script loads data and generates one plot, two comma-delimited tables, and a diagnostic log file. This script isn’t great and includes some bad practices that need to be cleaned up. But it should run for you without any changes as long as you are able to install the R packages used by the script.

We’re asking that you split this single script into several functions that can be used to build the same four things. When you are happy with your changes, delete the original script and commit your new script(s) into git source control. Use your same folder structure that was created for your open PR, but feel free to add a "3_visualize" phase. Note that you should only commit the code required to run your scripts. Generally, any data or files that can be reproduced by the code should not be committed to GitHub. For this reason, add anything that ends up in */out/* folders to your .gitignore file (read more about .gitignore files here) so that you do not accidentally commit them to GitHub.

Since you are turning the script into functions, let us know via a comment made to the pull request conversation that specifies how to run your code. For example:

data <- fetch_data()
plot_results(data)

It is harder for us to connect robot responses up to assignments related to writing good functions, so we're going to be tagging the humans too...

Push your commit(s) to the open pull request and assign your course contact for review.

A real live human will review your pull request when you've added them as a reviewer.

aaarcher-usgs commented 2 years ago

Explanation of file structure to run this pipeline.

run_all_scripts.R calls in the source functions to fetch, process, and visualize the data:

1_fetch/src/functions.R calls fetch() function to download data and write .csv
2_process/src/functions.R calls process() function to read in raw .csv and process for plotting. This function also write a table in csv that has all of the processed data
3_visualize/src/functions.R calls viz() function to create figure_1.png and whisker() function, which writes a text file for the diagnostic log

aaarcher-usgs commented 2 years ago

Updates as of 2022-06-02 11:52am CT

Resolved all suggestions from Julie, including:

renamed functions to be more descriptive
made input and output directories into arguments for all functions
documented each function file
documented the main script file in R-Oxygen for future knitting to html, including titles, organization, headers/footer, etc.
reduced redundancies in the diagnostic log function with for-loop
made new input arguments that included model types, experimental temperatures, colors and pch values
forced empty out/ folder directory structure to commit with .empty text files

padilla410 commented 2 years ago

Nicely done, Althea - specifically the doc. I am going to approve this, but I do have a few additional comments/notes that I'm going to sprinkle in that you will want to review before merging this PR and continuing on.

github-learning-lab[bot] commented 2 years ago

aaarcher-usgs / ds-pipelines-targets-1