One of the primary tenets of our approach to ETL is that it should be deterministic – that is, it should always produce the same result. Yet we also rely on external data sources, such as APIs or even client inputs. Variable inputs presents a challenge to deterministically producing outputs and can lead to baffling and difficult to debug errors.
Proposal
I'd like to do some reading about how other folks have handled this and compile some best practices and examples of good places to introduce defensive programming, validation of expected values, and even break points for manual review. I'll add these to our etl/ directory.
Deliverables
See above.
Timeline
I expect this to take about a day of focused work.
Cc @fgregg – any resources or examples that come to mind that may be relevant here?
Background
One of the primary tenets of our approach to ETL is that it should be deterministic – that is, it should always produce the same result. Yet we also rely on external data sources, such as APIs or even client inputs. Variable inputs presents a challenge to deterministically producing outputs and can lead to baffling and difficult to debug errors.
Proposal
I'd like to do some reading about how other folks have handled this and compile some best practices and examples of good places to introduce defensive programming, validation of expected values, and even break points for manual review. I'll add these to our
etl/
directory.Deliverables
See above.
Timeline
I expect this to take about a day of focused work.
Cc @fgregg – any resources or examples that come to mind that may be relevant here?