datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

More transparent handling of CastErrors #107

Open micimize opened 4 years ago

micimize commented 4 years ago

I think exception handling should be done at the processor level, but just giving a way to override the behavior would be good as well

https://github.com/datahq/dataflows/blob/5f4aeff8f498368f20beec57d4b708dabbc1b842/dataflows/base/datastream_processor.py#L82-L90

akariv commented 4 years ago

@micimize Note that the validate processor provides means to act upon individual casting errors (including whether to drop/pass the failing rows). You can add this processor as the last one in your flow to create special handling for such errors. I think the error logs in the process method should remain - as an indication to the user that something is not right - but the handling should be done using a processor.