GSS-Cogs / DataEngineering_Airflow_Alpha

0 stars 0 forks source link

Pipeline Dev: Investigate and plan validation #23

Open RedWalters opened 1 year ago

RedWalters commented 1 year ago

Currently the only validation in the airflow pipeline is csvlint which is better than nothing but not enough. We need to be able to check that:

Timing on implementation of these steps will also be a consideration, will any of the implemented validation steps need to be adjusted once we change to running Airflow on GCP, and if so is it worth waiting and only doing it once or getting it implemented and then adapting it.

If I've missed anything or anyone feels like more/other validation would be good let me know.

RedWalters commented 1 year ago

Another step in this task would be checking what validation the old pipelines run through and seeing what we want to adopt and what we want to leave behind.

One of the main things in this will be SPARQL testing, which - I believe - constitutes the main validation in the jenkins pipelines.