datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

Add processor (?and row?) information to exception object #126

Open cschloer opened 4 years ago

cschloer commented 4 years ago

Would it be possible to add the index (relative to the top level Flow) and name of the flow that failed when an exception is thrown in dataflows? I'm imagining something that intercepts every exception that comes out of running each flow, adds the relevant information either to the stack trace or the exception object, and then raises it again.

This is especially useful if you have multiple load steps and don't know which one is failing. An easy workaround would be to run them one at a time but this would make it a bit easier to work with :)

One step further would also be to find out which row # triggered an error. Obviously wouldn't be relevant for some processors, but for those that do manipulate at a row level it would be a significant help in debugging a failing dataflow.

cschloer commented 4 years ago

To be clear, what I'm talking about already exists with ValidationErrors (https://github.com/datahq/dataflows/blob/master/dataflows/base/schema_validator.py#L6) but it would be great if it could be extended beyond just validation and also the actual running of the flows.