-
Line 188 in etl_pipeline (https://github.com/coursera/dataduct/blob/develop/dataduct/etl/etl_pipeline.py) passes variable "load_min" as minute component of specified schedule time from YAML file. Howe…
-
### Description
In order to populate the delivery dashboard with metrics calculated based on data pulled from GitHub, we need a strategy to run the analytics pipeline created in the `analytics/` sub-…
-
## Context
We are hoping to automatically ingest our datasets in from sources (when possible and appropriate). This task is to do data quality validation to identify existing issues, and handle an…
-
The data-processing cluster in mlab-sandbox & mlab-staging is in us-east, while the archive-measurement-lab bucket is in us-central1. These clusters should be redeployed to us-central, and their outp…
-
Hi,
Great product!
I see an issue though - I want to specify mode-specific config eg S3_ETL_BUCKET. If I don't specify this key in the default config, when creating my pipeline with:
dataduc…
-
Being able to select pyarrow tables without copying, as well as accessing results as pyarrow tables without copying would be massively beneficial to building low-latency ETL pipelines and other data p…
-
I am trying to use this minimal config described here: http://dataduct.readthedocs.org/en/latest/config.html
But it appears that it's insufficient.
dataduct wouldn't start without a logging section …
-
## Description
Currently, the `data-test` and `data-heap-event` attributes, which provide information about the elements users click on, are part of a long string. We currently extract this data us…
-
### Terraform Core Version
1.3.1
### AWS Provider Version
5.63.0
### Affected Resource(s)
aws_osis_pipeline
### Expected Behavior
It should create osis pipeline with vpc endpoint.…
-
Currently, using Graph Store Purger with Apache Jena Fuseki results in Bad Request.
e.g. http://localhost:3030/nkod/data endpoint.
See [this execution](https://dev.nkod.opendata.cz/etl/#/pipelines…