Flowminder / FlowKit

FlowKit: Flowminder CDR analytics toolkit
https://flowminder.github.io/FlowKit/
Mozilla Public License 2.0
86 stars 21 forks source link

Re-work FlowETL QA check discovery #6497

Open jc-harrison opened 6 months ago

jc-harrison commented 6 months ago

FlowETL's current approach for adding QA checks to a DAG is to scan through all files in the template_searchpath, and create QACheckOperator tasks for all *.sql templates that contain "qa_checks" somewhere in the (relative) filepath. This is convenient for dynamically picking up QA checks, but has some downsides:

Aside from these potential pitfalls, there are other features that might be nice to have which are not possible with the current approach. E.g.:

It would be good to re-design the QA check discovery mechanism to avoid these shortcomings and allow more explicit control over check discovery when it's useful, without losing all of the convenience of the dynamic QA check discovery and injection of default QA checks.

jc-harrison commented 6 months ago

Ideally the solution here would no longer rely on us knowing the location of the DAG folder during DAG creation, so that we can remove the hack introduced in https://github.com/Flowminder/FlowKit/pull/6496.