Pablo-source / targets-test

Using Targets to automate markdown reports as a pipeline including TBATS ARIMA forecast
MIT License
2 stars 0 forks source link

targets-test

Project to practise creating analytical pipelines to run models using {targets} library.

Important:

Pipeline_01_populate_markdown_with_targets files:

Pipeline_02_to_render_markdown:

Pipeline_03_dynamic_branching files:

1. Targets quick start guide

After installing the package, we load targets “library(targets)”. Then our first step is to run “use_tergets()” function. This creates a new file called _tragets.R that is used to configure and setup the pipeline.

Follow these steps then detailed in the R Documentation section of the use_targets() function:

After you call use_targets(), there is still configuration left to do:

Open ⁠_targets.R⁠ and edit by hand. Follow the comments to write any options, packages, and target definitions that your pipeline requires.

Edit run.R and choose which pipeline function to execute (tar_make(), tar_make_clustermq(), or tar_make_future()).

If applicable, edit clustermq.tmpl and/or future.tmpl to configure settings for your resource manager.

If applicable, configure job.sh, "clustermq.tmpl", and/or "future.tmpl" for your resource manager.

1.1 Create single scripts for each analysis steps

1.2 Turn these single scripts into functions

There is a folder called "before targets" containing individual R scripts called "code_pre_targets.R" this script allows me to plan the analysis. The second script "scripts_into_functions_targets_prep.R" contains new functions based on initial scripts to wwork with Targets package

1.3 Functions used by Targets saved in R folder

1.4 Pipeline defined in the _targets.R file

pipeline

2024-03-28_09-27_all_three_new_targets_plots

All required files to run this pipeline saved in folder: Pipeline_04_data_wrangling_union_merge

1.5 Specific {targets} functions used to execute the pipeline

Load targets library library(targets)

2024-03-24_18-38_tar_manifest_output

Pipeline_functions

The plot created from our pipeline is now saved as an individual .png chart

1.6 Run pipeline

Fnally we run the pipeline we just built earlier using tar_make() function This function runs the correct targets in the correct order and saves the results to files tar_make()

05_Pipeline_completed_merged_files

Pipeline 01. populate markdown with targets

Everytime we update something in the pipeline we use "tar_make()" to re-run the entire pipeline. If some of the targets have not changed since last time we ran the pipeline, targets will skip those nodes in the pipeline called targets.

The tar_read() function we collect the pipeline output object to be used in specific sections of the Markdown report. For example, to use the data frame we creaetd on the first target we use tar_read(data). To use in the Markdown report the plot we created in the second Target object we use tar_read(plot). This allows us to populate our markdown report with specific objects created alongside the pipeline we just built and ran.

The final output of this pipeline is being used to create a fully rendered markdown report produced by the markdown file report.Rmd has been created and published in this repo: Markdown_report_output

The last step of this project has been building and rendering a markdown report called report.Rmd populated with the objects created in the pipeline by Targets. The aim is to autonmate the reports creation tasks by running a pipeline making it easier to mantain and update this report in the future. When rendering report.Rmd we obtain a document populated with tables and content from the pipeline. This could be expanded to automate reports ensuring reproducibility. Trying to follow RAP principles.

So now we have an initial pipeline that we can start to modify and expand to include extra analytical steps in the form of new targets

rendered_markdown_report_from_targets_pipeline

Pipeline 01. General pipeline structure using visnetwork

First we will merge all incoming .csv files, then we combine them into a single file and we use this new combined data frmae to populate our Markdown report.

This is the output usuing tar_visnetwork() function to check pipeline dependency graph

2024-03-28_tar_visnetwork_plots_output

As part of the data preparation stage for future modelling pipeline

Pipeline 01. Completed pipeline final output

This is the output of the complated pipeline run, with dataframes saved and required .csv files saved in the \objects folder

After using tar_make() function we get the complete report of which sections of the pipeline have ran

2024-03-28_09-35_tar_manifest_all_charts_created

All required files to run this pipeline saved in folder: Pipeline_01_populate_markdown_with_targets

1.Pipeline 02. Render Markdown in pipeline

We can render a Markdown document in the Targets pipeline by using {tarchetypes} library. This library provide us with the tar_render() function. So by adding a new target to our pipeline, we can render the report after the pipeline has run and it has populated our Markdown report.

TARGETS_file_render_markdown

And the rendering Targets function is now included in the pipeline: VISNETWORK_render_report_targets

After running the _targets file from this folder, we can automate the creation and rendering of a Markdown document inside the Targets pipeline

All required files to run this pipeline saved in folder: Pipeline_02_to_render_markdown

2.Pipeline 03. Dynamic branching and Time Series models forecast

Once the pipeline has run, before we implement a new feature (including a simple ARIMA model) defined in issue '#6', I have run fs:dir_tree("targets-test") to check whole set of objects created by Targets. The Markdown report has been populated by the three plots created in the pipeline.

In the coming week, I will be using Dynamic branching alongside Modeltime packages to introduce a couple of predictive models (ARIMA,Prophet) in the eixisting Pipeline. This is aimed to predict the next 5 months of Manufacturer's Value of Shipment for the following set of Shipment categories described below:

2.1 Dynanic branching

It is a way to define new targets while the pipeline is running. Opposed to declaring several targets up front. It is when you want to iterate over what is in the data, and you want a target that iterates by region. -Dynamic branching using {targets} https://books.ropensci.org/targets/dynamic.html

I will be using Dynamic branching to iterate over these four Economic Indicators downloaded from the FRED, Federal Reserve Economic Data:

Categories > Production & Business Activity > Manufacturing https://fred.stlouisfed.org/

Monthly time series indicators downloaded from FRED Economic Data. St Louis:

This is an example of dynamic branching using tarchetypes package based on Metric variable, creating 2 branches for the two metrics included in this workflow: tarchetypes package GitHub repo:https://github.com/ropensci/tarchetypes/tree/main

VISNETWORK_tarchetypes_by_metric

TARGETS_TEST_ISSUE_17_DYNAMIC_BRANCHING_ARIMA_01

TARGETS_TEST_ISSUE_17_DYNAMIC_BRANCHING_Tarchetypes

Visnetwork from the above workfow including branching

VISNETWORK_graph_branch_by_metric

All required files to run this pipeline saved in folder: Pipeline_03_dynamic_branching_files

4.Pipeline 05. Dynamic branching including ARIMA and Prophet models

This pipeline is completed and all required files to run it can be found in "Pipeline_05_ARIMA_Prophet_models" folder:

2024-04-30_17-47_VISNETWORK_ARIMA_MODEL_final

VISNETWORK_PROPHET_model

Using Modeltime Package to combine Prophet and ARIMA models in the previous Targets Pipeline. Modeltime package: https://business-science.github.io/modeltime/