-
As the creator of a notebook pipeline I would like to embed human-readable information in my [notebook] flow that enables other users to quickly determine what the purpose of (1) the pipeline and (2…
-
Building on https://github.com/snowplow/snowplow/issues/1198...
Currently in some places in the data pipeline on S3, the data lives in a folder that indicates which processing run it belongs to. (E.g…
-
Currently, the classic (Angular) version of the Platform allows users to download the evidence, association, target list, disease list, safety, baseline expression, and tractability data files.
The…
-
Complete remaining work to have a working implementation of HNC segmentation algorithm
Tasks:
- [x] Read the underlying paper
- [x] Revisit their code repository to determine if it may be reusable fo…
-
the pipeline should include
- [x] Pylint
- [x] Copying the dags folder to s3://kf-strides-232196027141-service-dag-bucket/dags
- [x] Copying the requirements.txt file to s3://kf-strides-232196027141-…
-
Based on the work in #1411, a sample JSON is now available that contains the fields generated by the ETL pipelines for each dataset.
**Command:**
`gsutil -m cat gs://ot-snapshots/etl/outputs/21.0…
-
Lack of this feature becomes prevalent while we're adding new custom storage targets and data-modeling using Analytics SDKs. Also, this can be a solution for [recovering complex pipelines](http://disc…
chuwy updated
3 years ago
-
Hi,
Thx for the fake gcs server.
When trying to copy file from one bucket to another I got this error.
```
self =
method = 'POST'
path = '/b/data-transfer1/o/test_file.txt/copyTo/b/data-trans…
-
KubeETL should make it easy for Data Engineers/Data Scientist to create ETL pipelines. This requires connection configuration. Often as ETL projects scale, source/sink configuration can become a mess.…
-
As part of #1411, we will implement a new data downloads page that allows users to download a larger list of files. The implementation is based on each data file being included in a JSONlines file tha…