Closed github-learning-lab[bot] closed 2 years ago
As stated above, object targets are R objects that represent intermediate objects in an analysis.
Object targets are common in the example pipelines we have shown before. They are distinguished from file targets in the following ways:
_csv
) and resembles an R variable name (because that is basically what the object target is)write_csv
, ggsave
, write_feather
, nc_create
, etc. The return value of a function is either the value of the last expression in the function or the argument to a call to return()
.These objects are often used because they offer a brevity advantage over files (e.g., you don't need to pass in a filename to the function) and preserve the classes and formatting of the data, which makes it a bit easier to keep dates, factors, and other special data types from changing when you write - and then later read in - a file (such as a .csv). Objects also give you the illusion that they aren't taking up space in your project directory and make workspaces look a bit tidier.
The "illusion" :tophat::rabbit: of objects not taking up space is because behind the scenes, these objects are actually written to file (.rds files, to be specific). You can see what exists under the hood with dir('_targets/objects')
. The default is for targets
to store these as .rds
files. There are other formats that can be used to store the intermediate objects; if you're curious, check out the documentation for the format
argument to tar_target()
.
You can take a look at that same object referenced in https://github.com/elmeraa/ds-pipelines-targets-2/issues/4 by using
readRDS('_targets/objects/map.config')
$missing_data
[1] "grey90"
$plot_CRS
[1] "+init=epsg:2163"
$wfs
[1] "http://cida.usgs.gov/gdp/geoserver/wfs"
$feature
[1] "derivative:wbdhu8_alb_simp"
$countBins
[1] 0 1 2 5 10 20 50 100 200 500 1000
(Not as convenient as accessing the data with tar_read('map.config')
instead, which is what we'd recommend).
:keyboard: Add a comment to this issue so we know you're ready to continue learning
Adding comment to continue on
File targets are very flexible and, of course, are also easy to share or store elsewhere.
Additionally, many file formats are either language agnostic (e.g., csv, tsv, txt, nc files) or are meant to be shared across languages, such as the feather format designed for exchange between R and Python.
When specifying a file target in a makefile, the path to the file needs to be either absolute or relative to the working directory that the _targets.R
file is in.
Since file targets in the targets
package are not the default and require you to add format = "file"
, you may feel deterred from using files as targets. It's true, the benefits of files are often small compared to the advantages of using objects. However, we still recommend that files be used liberally, especially for targets that you'll want to access outside of R (e.g., browsing figure files in Finder/Windows Explorer; opening a spatial data file in a GIS) or share with others (e.g., using outputs from one pipeline as inputs to another).
:keyboard: Activity: Close this issue when you are ready to move on to the next assignment
Targets
"Targets" are the main things that the
targets
package interacts with (if the name hadn't already implied that :zany_face:). They represent things that are made (they're also the vertices of the dependency graph). If you want to make a plot calledplot.pdf
, then that's a target. If you depend on a dataset calleddata.csv
, that's a target (even if it already exists).In
targets
, there are two main types:format = "file"
added as an argument totar_target()
and their command must return the filepath(s). We have learned that file targets can be single files, a vector of filepaths, or a directory. USGS Data Science workflows name file targets using their base name and their file extension, e.g. the target for"1_fetch/out/data.csv"
would bedata_csv
. If the file name is really long, you can always simplify it for the target name but it is important to include_[extension]
as a suffix. Additionally, USGS Data Science pipelines include the filenames created by file targets as typed-out arguments in the target recipe, or in a comment in the target definition. This practice ensures that you and your colleagues will only have to read the makefile, not the function code, to learn what file is being created.tar_load(target_name)
).:keyboard: Activity: Assign yourself to this issue to get started.
I'll sit patiently until you've assigned yourself to this one.