Closed github-learning-lab[bot] closed 2 years ago
Before you edit any code, create a local branch called "three-states" and push that branch up to the remote location "origin" (which is the github host of your repository).
git checkout main
git pull origin main
git checkout -b three-states
git push -u origin three-states
The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to main
and sync with "origin" whenever you're transitioning between branches and/or PRs.
pushed
Without modifying any code, start by inspecting and running the existing data pipeline.
oldest_active_sites
.:bulb: Refresher hints:
library(targets)
and then tar_make()
.tar_load(mytarget)
. This function will load the object in its current state. tar_make(mytarget)
and then using tar_load(mytarget)
.library(targets)
in your R session while developing pipeline code - otherwise, you need to call targets::tar_make()
in place of tar_make()
anytime you run that command, and all that extra typing can add up.When you're satisfied that you understand the current pipeline, include the value of oldest_active_sites$site_no
and the image from site_map.png in a comment on this issue.
[1] "04073500" "05211000" "04063522"
Hey, did you notice that there's a split-apply-combine action happening in this repo already?
Check out the find_oldest_sites()
function:
find_oldest_sites <- function(states, parameter) {
purrr::map_df(states, find_oldest_site, parameter)
}
This function:
states
into each individual statefind_oldest_site
to each statetibble
and it all happened in just one line! The split-apply-combine operations we'll be exploring in this course require more code and are more useful for slow or fault-prone activities, but they follow the same general pattern.
Check out the documentation for map_df
at ?purrr::map_df
or online here if this function is new to you.
ok
Awesome, time for your first code changes :pencil2:.
[x] Write three targets in _targets.R to apply get_site_data()
to each state in states
(insert these new targets under the # TODO: PULL SITE DATA HERE
placeholder in _targets.R
). The targets should be named wi_data
, mn_data
, and mi_data
. oldest_active_sites
should be used for the sites_info
argument in get_site_data()
.
[x] Add a call to source()
near the top of _targets.R as needed to make your pipeline executable.
[x] Test it: You should be able to run tar_make()
with no arguments to get everything built.
:bulb: Hint: the get_site_data()
function already exists and shouldn't need modification. You can find it by browsing the repo or by hitting Ctrl-SHIFT-F. in RStudio and then searching for "get_site_data".
When you're satisfied with your code, open a PR to merge the "three-states" branch into "main". Make sure to add _targets/*
, 3_visualize/out/*
, and any .DS_Store files to your .gitignore
file before committing anything. In the description box for your PR, include a screenshot or transcript of your console session where the targets get built.
It's time to meet the data analysis challenge for this course! Over the next series of issues, you'll connect with the USGS National Water Information System (NWIS) web service to learn about some of the longest-running monitoring stations in USGS streamgaging history.
The repository for this course is already set up with a basic targets data pipeline that: