Learn the differences between different types of targets

github-learning-lab[bot] commented 2 years ago

Targets

"Targets" are the main things that the targets package interacts with (if the name hadn't already implied that :zany_face:). They represent things that are made (they're also the vertices of the dependency graph). If you want to make a plot called plot.pdf, then that's a target. If you depend on a dataset called data.csv, that's a target (even if it already exists).

In targets, there are two main types:

files: These are the targets that need to have format = "file" added as an argument to tar_target() and their command must return the filepath(s). We have learned that file targets can be single files, a vector of filepaths, or a directory. USGS Data Science workflows name file targets using their base name and their file extension, e.g. the target for "1_fetch/out/data.csv" would be data_csv. If the file name is really long, you can always simplify it for the target name but it is important to include _[extension] as a suffix. Additionally, USGS Data Science pipelines include the filenames created by file targets as typed-out arguments in the target recipe, or in a comment in the target definition. This practice ensures that you and your colleagues will only have to read the makefile, not the function code, to learn what file is being created.
objects: These are R objects that represent intermediate objects in an analysis. Behind the scenes, these objects are stored to disk so that they persist across R sessions. And unlike typical R objects, they do not exist in your workspace unless you explicitly load them (run tar_load(target_name)).

:keyboard: Activity: Assign yourself to this issue to get started.

I'll sit patiently until you've assigned yourself to this one.

github-learning-lab[bot] commented 2 years ago

More details on object targets

As stated above, object targets are R objects that represent intermediate objects in an analysis.

Object targets are common in the example pipelines we have shown before. They are distinguished from file targets in the following ways:

The target name does not have a file extension (e.g., _csv) and resembles an R variable name (because that is basically what the object target is)
The function that creates the target returns some data to generate the target as opposed to creating or appending to a file, e.g., with write_csv, ggsave, write_feather, nc_create, etc. The return value of a function is either the value of the last expression in the function or the argument to a call to return().

These objects are often used because they offer a brevity advantage over files (e.g., you don't need to pass in a filename to the function) and preserve the classes and formatting of the data, which makes it a bit easier to keep dates, factors, and other special data types from changing when you write - and then later read in - a file (such as a .csv). Objects also give you the illusion that they aren't taking up space in your project directory and make workspaces look a bit tidier.

The "illusion" :tophat::rabbit: of objects not taking up space is because behind the scenes, these objects are actually written to file (.rds files, to be specific). You can see what exists under the hood with dir('_targets/objects'). The default is for targets to store these as .rds files. There are other formats that can be used to store the intermediate objects; if you're curious, check out the documentation for the format argument to tar_target().

You can take a look at that same object referenced in https://github.com/elmeraa/ds-pipelines-targets-2/issues/4 by using

readRDS('_targets/objects/map.config')
$missing_data
[1] "grey90"

$plot_CRS
[1] "+init=epsg:2163"

$wfs
[1] "http://cida.usgs.gov/gdp/geoserver/wfs"

$feature
[1] "derivative:wbdhu8_alb_simp"

$countBins
 [1]    0    1    2    5   10   20   50  100  200  500 1000

(Not as convenient as accessing the data with tar_read('map.config') instead, which is what we'd recommend).

:keyboard: Add a comment to this issue so we know you're ready to continue learning

I'll sit patiently until you've added a comment to this issue.

elmeraa-usgs commented 2 years ago

Adding comment to continue on

github-learning-lab[bot] commented 2 years ago

More details on file targets

File targets are very flexible and, of course, are also easy to share or store elsewhere.

Additionally, many file formats are either language agnostic (e.g., csv, tsv, txt, nc files) or are meant to be shared across languages, such as the feather format designed for exchange between R and Python.

When specifying a file target in a makefile, the path to the file needs to be either absolute or relative to the working directory that the _targets.R file is in.

Since file targets in the targets package are not the default and require you to add format = "file", you may feel deterred from using files as targets. It's true, the benefits of files are often small compared to the advantages of using objects. However, we still recommend that files be used liberally, especially for targets that you'll want to access outside of R (e.g., browsing figure files in Finder/Windows Explorer; opening a spatial data file in a GIS) or share with others (e.g., using outputs from one pipeline as inputs to another).

:keyboard: Activity: Close this issue when you are ready to move on to the next assignment

I'll sit patiently until this issue is closed.

github-learning-lab[bot] commented 2 years ago

elmeraa-usgs / ds-pipelines-targets-2