-
Some stages were implemented in #94, but many are missing.
Also many classes in `io.cebes.spark.pipeline.etl` don't have any test case yet.
-
### Description
In order to populate the delivery dashboard with metrics calculated based on data pulled from GitHub, we need a strategy to run the analytics pipeline created in the `analytics/` sub-…
-
**Describe the bug**
It has been identified that a certain drug/target connection is missing from the literature dataset. This ([32955176](https://europepmc.org/article/MED/32955176)) EuroPMC paper…
-
## Task Description
- We need a backlog of cases to visualise a pattern of judge verdicts, including the daily ~2+ new high court cases.
- The more cases we look at, the more it'll cost when using…
-
### Describe the feature
Support KeepJobFlowAliveWhenNoSteps Or Auto-termination (after idle) in stepfunction creates EMR cluster.
### Use Case
Our team is using Stepfunction EMR (EmrCreateCluste…
-
**Describe the bug**
There are 200M more matches in the raw data than in the normalized and failed datasets combined.
- Failed matches count: 247_332_452
- Grounded matches count: 444_674_668
…
-
I'm proably doing something wrong here. I followed the readme on the letter. De application runs but I get errors on the pipeline page.
```
Can't load data
Request failed with status: 500.
Pleas…
Pin0 updated
4 months ago
-
The pilot 3 mapping spreadsheet has `contract_initial_value_cost_eur` mapped to both `pc:actualPrice` and `pc:estimatedPrice`. Is this just a mistake in the spreadsheet or also in your ETL pipeline?
-
https://docs.mage.ai/introduction/overview
https://atlan.com/mage-data-orchestration/
-
This issue is about improving the pipeline to correctly flag websites that are down. More specifically, this issue is about making sure the ETL pipeline recognizes all the states that are already reco…