Closed github-learning-lab[bot] closed 4 years ago
As stated above, object targets are R objects that represent intermediate objects in an analysis.
"R objects" are common in the example pipelines we have shown before. They are distinguished from file targets in the following ways:
write.file(target_file)
(there are all kinds of functions that write files, including write.csv
, cat
, write_feather
, nc_create
, etc). Data can be returned from a function either because R functions return the value of the last expression evaluated or because the function explicitly specifies what is returned, such as using return(target_data)
.These objects are often used because they offer a brevity advantage over files (e.g., water_quality_values
vs 1_fetch/out/water_quality_values.csv
) and preserve the classes and formatting of the data, which makes it a bit easier to keep dates, factors, and other special data types from changing when you write - and then later read in - a file (such as a .csv). Objects also give you the illusion that they aren't taking up space in your project directory and make workspaces look a bit tidier.
The "illusion" :tophat::rabbit: of objects not taking up space is because behind the scenes, these objects are actually written to file (.rds files, to be specific). You can see what exists under the hood with dir('.remake/objects/data')
And I was able to take a look at that same object referenced in https://github.com/collnell/ds-pipelines-2/issues/2 by using
readRDS('.remake/objects/data/0e8d236d17d49a764c3fe2aaef0d2491.rds')
$missing_data
[1] "grey90"
$plot_CRS
[1] "+init=epsg:2163"
$wfs
[1] "http://cida.usgs.gov/gdp/geoserver/wfs"
$feature
[1] "derivative:wbdhu8_alb_simp"
$countBins
[1] 0 1 2 5 10 20 50 100 200 500 1000
(A lot funkier than accessing the data with scmake('map.config')
instead, which is what we'd recommend).
:keyboard: Add a comment to this issue so we know you're ready to continue learning
let's keep going!
File targets are very flexible and of course, are also easy to share or store elsewhere.
Additionally, many targets are either language agnostic (e.g., csv, tsv, txt, nc files) or are meant to be shared across languages, such as the how the feather file was designed for exchange between R and Python.
When specifying a target in a remakefile recipe with file targets, the path to the file needs to be either absolute or relative to the working directory that the remake.yml
file is in.
Most of the guidance you'd see on the remake package whould steer you away from using files as targets, since the benefits of files are quite small compared to the advantages of using objects. In fact, one of the edits I made to the background on target types that was borrowed from remake
was to remove the statement "With remake though, [file targets] should probably only be the beginning or end points of an analysis", which is referring to the end products of a pipeline likely being figures, tables, markdown files, or documents (all files) and encouraging all other targets to be objects. For reasons that will become cleared in the future, we instaead recommend that files be used more liberally than objects because of two reasons: 1) ability to store data remotely in file format, and 2) ease of collaboration. You'll here more about this in the intermediate pipelines courses and when you see some more of the team's pipelines in practice.
:keyboard: Activity: Close this issue when you are ready to move on to the next assignment
remake
is the R package that underlies many ofscipiper
's functions. Here we've borrowed some text from theremake
github repo (credit to richfitz, although we've lightly edited the original text) to explain differences between targetsTargets
"Targets" are the main things that
remake
interacts with. They represent things that are made (they're also the vertices of the dependency graph). If you want to make a plot calledplot.pdf
, then that's a target. If you depend on a dataset calleddata.csv
, that's a target (even if it already exists).There are several types of targets:
remake
(files are the main types of targets thatmake
deals with, since it is language agnostic). Within files, there are two sub-types:command
in a remakefile). You can't build these of course. However,remake
will build an implicit file target for them so it can internally monitor changes to that file.make
these are "phoney" targets). Theall
depends on all the "end points" of your analysis is a "fake" target. Runningscmake("all")
will build all of your targets, or verify that they are up to date.:keyboard: Activity: Assign yourself to this issue to get started.
I'll sit patiently until you've assigned yourself to this one.