Closed github-learning-lab[bot] closed 3 years ago
@lindsayplatt, if you don't get to this, no worries. I'll tag Sam tomorrow.
I'm trying to avoid doing something like this:
site_nums <- c("01427207", "01432160", "01435000", "01436690", "01466500")
p1_targets_list <- list( tar_target(site_data_1, download_nwis_data(site_nums[1])), tar_target(site_data_2, download_nwis_data(site_nums[2])), tar_target(site_data_3, download_nwis_data(site_nums[3])), tar_target(site_data_4, download_nwis_data(site_nums[4])), tar_target(site_data_5, download_nwis_data(site_nums[5])), tar_target(site_info_csv_1, nwis_site_info(fileout = "1_fetch/out/site_info_1.csv", site_data_1), format = "file"), tar_target(site_info_csv_2, nwis_site_info(fileout = "1_fetch/out/site_info_2.csv", site_data_2), format = "file"), tar_target(site_info_csv_3, nwis_site_info(fileout = "1_fetch/out/site_info_3.csv", site_data_3), format = "file"), tar_target(site_info_csv_4, nwis_site_info(fileout = "1_fetch/out/site_info_4.csv", site_data_4), format = "file"), tar_target(site_info_csv_5, nwis_site_info(fileout = "1_fetch/out/site_info_5.csv", site_data_5), format = "file") )
It's kinda hurting my brain, cause I don't want to repeat myself when the same thing is happening to the same kind of object, so should I define a function that takes in the sites and makes the targets? that's sort of meta. I'm also trying to think about this when it's many more sites than the 5 we have here, and I realize you all put in the timeouts artificially here, so, yeah, not sure how this "solution" would scale. (It's the end of the day again and my brain is fried. I'll try again tomorrow)
Hi @whiteellie! I love that you are thinking of scaling BUT I would just say, don't think too hard about it at this point. All of Pipelines III is about answering that question - how can we write a pipeline such that we don't need to copy/paste targets code for each site that we want to use. As an example, here is what my targets for this step looked like:
tar_target(site_data_01427207_csv, download_nwis_site_data('1_fetch/tmp/site_data_01427207.csv'), format = "file"),
tar_target(site_data_01432160_csv, download_nwis_site_data('1_fetch/tmp/site_data_01432160.csv'), format = "file"),
tar_target(site_data_01435000_csv, download_nwis_site_data('1_fetch/tmp/site_data_01435000.csv'), format = "file"),
tar_target(site_data_01436690_csv, download_nwis_site_data('1_fetch/tmp/site_data_01436690.csv'), format = "file"),
tar_target(site_data_01466500_csv, download_nwis_site_data('1_fetch/tmp/site_data_01466500.csv'), format = "file"),
Very repetitive, which is typically something we think "gah! cannot repeat code. there must be a better way" and there totally is. But for now, roll with this because that scaling topic is the entirety of Pipelines III 😄
@limnoliver I'm not sure why this is happening, when I run tar_make()
it gives me an error saying I can't rename a column that doesn't exist... So, I put a browser()
inside the function where this is happening, and when I step through the function (I've done this with 2 iterations of the loop) it works out fine. The column exists and gets renamed just fine. Any ideas why target
is acting weird?
Here's the function:
process_data <- function(filepath){
browser()
nwis_data <- read_csv(file = filepath)
nwis_data_clean <- nwis_data %>%
rename(water_temperature = X_00010_00000) %>%
select(-agency_cd, -X_00010_00000_cd, -tz_cd)
return(nwis_data_clean)
}
There's nothing sticking out to me in that chunk. Maybe push your code and create a PR, and I can help you debug?
Well - welcome to using NWIS! Take a look at the csv for site 01435000
. What's going on here?!
Looks like there are a lot of temperature columns for this site - which come from different sensors but that all of the 00010
parameter code. There has been recent work at this site, which probably recently forced further naming of those columns which is tripping up the pipeline. For now, why don't you just exclude that site and carry on? I created an issue in this course's repo so we can permanently change that.
Hahaha. This is why I always write a try/catch with NWIS.
You'll get to do all of these things in Piplines III :)
:keyboard: Activity: Make modifications to the working, but less than ideal, pipeline that exists within your course repository
Within the course repo you should see only a
_targets.R
and directories with code or placeholder files for each phase. You should be able to runtar_make()
and build the pipeline, although it may take numerous tries, since some parts of this new workflow are brittle. Some hints to get you started: thesite_data
target is too big, and you should consider splitting it into a target for each site, perhaps using thedownload_nwis_site_data()
function directly to write a file. Several of thesite_data_
targets are too small and it might make sense to combine them.When you are happy with your newer, better workflow, create a pull request with your changes and assign your designated course instructor as a reviewer. Add a comment to your own PR with thoughts on how you approached the task, as well as key decisions you made.
Recall that you should not be committing any build artifacts of the pipeline to GitHub, so make sure that your
*/out/*
folders are included in your.gitignore
file.You should create a local branch called "refactor-targets" and push that branch up to the "remote" location (which is the github host of your repository). We're naming this branch "refactor-targets" to represent concepts in this section of the lab. In the future you'll probably choose branch names according to the type of work they contain - for example,
"pull-oxygen-data"
or"fix-issue-17"
.A human will interact with your pull request once you assign them as a reviewer