-
- [x] Archive and delete branches
- [x] Remove unused or old code
- [x] Merge airflow to master branch
- [ ] Update documentation with the new steps to run the ETL pipeline
- [ ] Check CI/CD pipel…
-
The test coverage ETL and the unittest ETL generate two different types of suite names. Ensure the two pipelines generate standard suites names that can be compared.
-
Includes
- OpenShift dev/test/prod ETL pipeline
- Ability to transform all data from postgres back to Oracle, and vice versa, must be bidrectional
- Ability to schedule jobs, or run it "live", depend…
-
Copying from https://github.com/opensearch-project/ml-commons/issues/1162 as it is applicable to opensearch more broadly. Splitting into this feature request and https://github.com/opensearch-project/…
-
The gardener and etl pipeline both need to know which dataset/table the rows are being written to. Currently they each use their own logic or env vars to determine this, which is very fragile. Inste…
-
I understand that this is very early and not yet ready for prime-time, but I tried running through this locally and had some issues:
- Data volume was too high for my mac. The processing side cou…
-
**Description**
For the tests on the scoring code, these live in `data/data-pipeline/data_pipeline/etl/score/tests`, next to the code that they're testing.
However, for the national risk index ET…
-
Currently, using Graph Store Purger with Apache Jena Fuseki results in Bad Request.
e.g. http://localhost:3030/nkod/data endpoint.
See [this execution](https://dev.nkod.opendata.cz/etl/#/pipelines…
-
I've been looking in the documentation for several days, but I can't find the way, nor examples. To extract data from a parquet hosted in the aws S3... I don't know if I'm a potential nerd. But I'm g…
-
While writing out a DataFrame to parquet from a Prefect task (not sure if the Prefect part is actually important or not), I got the following error:
```python
Traceback (most recent call last):
…