Project to practise creating analytical pipelines to run models using {targets} library.
Important:
Each pipeline has its unique "_targets.R" file. And each pipeline will contain specific set of tar_target() and tar_group_by() functions used to configure the pipeline structure for each project.
As this _targets.R file must retain its original name, I will save each _targets.R file for each pipeline in a separate folder in this GitHub project.
Each pipeline folder will have to be run on a dedicated and individual R project to match the targets list from _targets.R file for that pipeline with its related set of adhoc R functions stored in the \R folder
This will ensure each pipeline works for the purpose stated in the pipeline folder created at the top of this project.
So when downloading each pipeline folder, it will contain the "_targets.R" files and related functions saved in the \R folder. All required input files will be sourced from the \data folder
Pipeline_01_populate_markdown_with_targets files:
Pipeline_02_to_render_markdown:
Pipeline_03_dynamic_branching files:
After installing the package, we load targets “library(targets)”. Then our first step is to run “use_tergets()” function. This creates a new file called _tragets.R that is used to configure and setup the pipeline.
Follow these steps then detailed in the R Documentation section of the use_targets() function:
After you call use_targets(), there is still configuration left to do:
Open _targets.R and edit by hand. Follow the comments to write any options, packages, and target definitions that your pipeline requires.
Edit run.R and choose which pipeline function to execute (tar_make(), tar_make_clustermq(), or tar_make_future()).
If applicable, edit clustermq.tmpl and/or future.tmpl to configure settings for your resource manager.
If applicable, configure job.sh, "clustermq.tmpl", and/or "future.tmpl" for your resource manager.
In this example I have started creating one script to load the data and another one to create a plot from that data
See script: before_targets/code_pre_targets.R
There is a folder called "before targets" containing individual R scripts called "code_pre_targets.R" this script allows me to plan the analysis. The second script "scripts_into_functions_targets_prep.R" contains new functions based on initial scripts to wwork with Targets package
The set of functions we want to run as part of our pipeline, are saved in the R folder for Targets to use them when executing the pipeline
see script "study_functions.R" initial scripts for each analysis step turned into functions to be used in targets pipeline
See script: R/study_functions.R
pipeline
All required files to run this pipeline saved in folder: Pipeline_04_data_wrangling_union_merge
Load targets library library(targets)
Then check pipeline dependency graph using tar_visnetwork() function tar_visnetwork()
Finally we run the pipeline we just built earlier using tar_make() function tar_make()
The plot created from our pipeline is now saved as an individual .png chart
Fnally we run the pipeline we just built earlier using tar_make() function This function runs the correct targets in the correct order and saves the results to files tar_make()
Everytime we update something in the pipeline we use "tar_make()" to re-run the entire pipeline. If some of the targets have not changed since last time we ran the pipeline, targets will skip those nodes in the pipeline called targets.
The tar_read() function we collect the pipeline output object to be used in specific sections of the Markdown report. For example, to use the data frame we creaetd on the first target we use tar_read(data). To use in the Markdown report the plot we created in the second Target object we use tar_read(plot). This allows us to populate our markdown report with specific objects created alongside the pipeline we just built and ran.
The final output of this pipeline is being used to create a fully rendered markdown report produced by the markdown file report.Rmd has been created and published in this repo:
The last step of this project has been building and rendering a markdown report called report.Rmd populated with the objects created in the pipeline by Targets. The aim is to autonmate the reports creation tasks by running a pipeline making it easier to mantain and update this report in the future. When rendering report.Rmd we obtain a document populated with tables and content from the pipeline. This could be expanded to automate reports ensuring reproducibility. Trying to follow RAP principles.
So now we have an initial pipeline that we can start to modify and expand to include extra analytical steps in the form of new targets
First we will merge all incoming .csv files, then we combine them into a single file and we use this new combined data frmae to populate our Markdown report.
This is the output usuing tar_visnetwork() function to check pipeline dependency graph
As part of the data preparation stage for future modelling pipeline
This is the output of the complated pipeline run, with dataframes saved and required .csv files saved in the \objects folder
After using tar_make() function we get the complete report of which sections of the pipeline have ran
All required files to run this pipeline saved in folder: Pipeline_01_populate_markdown_with_targets
We can render a Markdown document in the Targets pipeline by using {tarchetypes} library. This library provide us with the tar_render() function. So by adding a new target to our pipeline, we can render the report after the pipeline has run and it has populated our Markdown report.
And the rendering Targets function is now included in the pipeline:
After running the _targets file from this folder, we can automate the creation and rendering of a Markdown document inside the Targets pipeline
All required files to run this pipeline saved in folder: Pipeline_02_to_render_markdown
Once the pipeline has run, before we implement a new feature (including a simple ARIMA model) defined in issue '#6', I have run fs:dir_tree("targets-test") to check whole set of objects created by Targets. The Markdown report has been populated by the three plots created in the pipeline.
In the coming week, I will be using Dynamic branching alongside Modeltime packages to introduce a couple of predictive models (ARIMA,Prophet) in the eixisting Pipeline. This is aimed to predict the next 5 months of Manufacturer's Value of Shipment for the following set of Shipment categories described below:
It is a way to define new targets while the pipeline is running. Opposed to declaring several targets up front. It is when you want to iterate over what is in the data, and you want a target that iterates by region. -Dynamic branching using {targets} https://books.ropensci.org/targets/dynamic.html
I will be using Dynamic branching to iterate over these four Economic Indicators downloaded from the FRED, Federal Reserve Economic Data:
Categories > Production & Business Activity > Manufacturing https://fred.stlouisfed.org/
Monthly time series indicators downloaded from FRED Economic Data. St Louis:
This is an example of dynamic branching using tarchetypes package based on Metric variable, creating 2 branches for the two metrics included in this workflow: tarchetypes package GitHub repo:https://github.com/ropensci/tarchetypes/tree/main
Visnetwork from the above workfow including branching
All required files to run this pipeline saved in folder: Pipeline_03_dynamic_branching_files
This pipeline is completed and all required files to run it can be found in "Pipeline_05_ARIMA_Prophet_models" folder:
Using Modeltime Package to combine Prophet and ARIMA models in the previous Targets Pipeline. Modeltime package: https://business-science.github.io/modeltime/