daranzolin / sqltargets

targets extension for SQL queries
https://daranzolin.github.io/sqltargets/
Other
34 stars 1 forks source link
pipeline r rstats sql targets workflow

sqltargets

Project Status: WIP – Initial development is in progress, but there
has not yet been a stable, usable release suitable for the
public. R-CMD-check CRAN
status R
Targetopia

sqltargets makes it easy to integrate SQL files within your targets workflows. The shorthand tar_sql() creates two targets: (1) the ‘upstream’ SQL file; and (2) the ‘downstream’ result of the query. Dependencies can be specified by calling tar_load() within SQL comments. The template engine can be specified using the sqltargets.template_engine option (either ‘glue’ or ‘jinjar’).

Installation

You can install sqltargets from CRAN with:

install.packages("sqltargets")

You can install the development version of sqltargets with:

remotes::install_github("daranzolin/sqltargets)

Demo

See the sqltargets-demo repository for a reproducible demonstration.

Dependencies

Use tar_load or targets::tar_load within a SQL comment to indicate query dependencies. Check the dependencies of any query with tar_sql_deps.

library(sqltargets)
lines <- c(
   "-- !preview conn=DBI::dbConnect(RSQLite::SQLite())",
   "-- targets::tar_load(data1)",
   "-- targets::tar_load(data2)",
   "select 1 AS my_col",
   ""
 )
 query <- tempfile()
 writeLines(lines, query)
 tar_sql_deps(query)
#> [1] "data1" "data2"

Parameters

You can pass parameters (presumably from another object in your targets project) to tar_sql() using one of two ‘template engines’: glue or ‘Jinja’ (courtesy of the ‘jinjar’ package.)

Set the ‘template engine’ with sqltargets_option_set("sqltargets.template_engine", "jinjar"). (‘glue’ is the default.)

With glue:

query.sql

-- !preview conn=DBI::dbConnect(RSQLite::SQLite())
-- tar_load(params)
select id
from table
where age > {age_threshold}

_targets.R

library(targets)
library(sqltargets)
list(
  tar_target(params, list(age_threshold = 30)),
  tar_sql(report, path = "query.sql", params = params)
  )

With ‘Jinja’:

query.sql

-- !preview conn=DBI::dbConnect(RSQLite::SQLite())
-- tar_load(payment_methods)
select
order_id,
{% for payment_method in params.payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount
{% if not loop.is_last %},{% endif %}
{% endfor %}
from payments
group by 1

_targets.R

library(targets)
library(sqltargets)

sqltargets_option_set("sqltargets.template_engine", "jinjar")

list(
  tar_target(payment_methods, list(payment_methods = c("bank_transfer", "credit_card", "gift_card"))),
  tar_sql(report, path = "query.sql", params = payment_methods)
  )

Note that loop.is_last differs from typical Jinja (loop.last). Refer to this ‘jinjar’ vignette for other syntactical differences.

Code of Conduct

Please note that the sqltargets project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Acknowledgement

Much of the code has been adapted from the excellent tarchetypes package. Special thanks to the authors and Will Landau in particular for revolutionizing data pipelines in R.