Kfaunce / ds-pipelines-targets-2

https://lab.github.com/USGS-R/usgs-targets-tips-and-tricks
0 stars 0 forks source link

Refactor the existing pipeline to use more effective targets #5

Closed github-learning-lab[bot] closed 2 years ago

github-learning-lab[bot] commented 2 years ago

:keyboard: Activity: Make modifications to the working, but less than ideal, pipeline that exists within your course repository

Within the course repo you should see only a _targets.R and directories with code or placeholder files for each phase. You should be able to run tar_make() and build the pipeline, although it may take numerous tries, since some parts of this new workflow are brittle. Some hints to get you started: the site_data target is too big, and you should consider splitting it into a target for each site, perhaps using the download_nwis_site_data() function directly to write a file. Several of the site_data_ targets are too small and it might make sense to combine them.


When you are happy with your newer, better workflow, create a pull request with your changes and assign your designated course instructor as a reviewer. Add a comment to your own PR with thoughts on how you approached the task, as well as key decisions you made.

Recall that you should not be committing any build artifacts of the pipeline to GitHub, so make sure that your */out/* folders are included in your .gitignore file.

You should create a local branch called "refactor-targets" and push that branch up to the "remote" location (which is the github host of your repository). We're naming this branch "refactor-targets" to represent concepts in this section of the lab. In the future you'll probably choose branch names according to the type of work they contain - for example, "pull-oxygen-data" or "fix-issue-17".

git checkout -b refactor-targets
git push -u origin refactor-targets

A human will interact with your pull request once you assign them as a reviewer

Kfaunce commented 2 years ago

@lindsayplatt - I forgot I had not yet created a new branch before pushing and accidentally overwrote main. So sorry! I used git revert to restore the original repository. Hopefully that has not messed up anything.

I had some questions related to this exercise. After the snafu above, I pushed changes to refactor-targets with how I am trying to approach this that maybe you can take a look at? Just focusing on the functions and targets in p1_targets_list for now, I was trying to loop through a list of site numbers in the site_data target and have it return the directory, then have a secondary target that references the site_data directory and compiles the individually downloaded files.

I can't get it to work (errors related to the target object), but I'm not sure if this is because it is not a feasible solution or if I'm missing something minor in my approach/code. I know I could do something like the following:

p1_targets_list <- list( tar_target( site_data_01427207, download_nwis_data( "01427207" "1_fetch/out/", # out directory startDate, endDate, parameterCd ), format = "file" ),

and create an individual target for each site number that way, with the function returning the individual .csv file, but working from a set list of stations at the top of the script seemed cleaner. I appreciate any input you have!

lindsayplatt commented 2 years ago

@Kfaunce great job working around your Git issues and getting it going again 👍

I think the easiest thing would be to make your PR. I can see your full pipeline there and comment on specific lines that may not be working. We will just wait until you are done before actually merging it. My initial hunch is that it is because you are using a directory as a target. I have found using directories as targets to be finicky, though they are possible.