frictionlessdata / tableschema-py

A Python library for working with Table Schema.

https://frictionlessdata.io

MIT License

260 stars 41 forks source link

Remove fail_fast where it's not needed #135

Closed roll closed 7 years ago

roll commented 7 years ago

Overview

For now fail_fast is used in:

Schema.cast_row
validate

Both are not about work speed because even without fail_fast row or descriptor will be processed very fast. So it's only for provide different API based on this flag.

E.g. if fail_fast=True validate will raise ValidationError otherwise MultipleInvalid with errors attribute. This API seems complicated. JavaScript library doesn't use this flag: https://github.com/frictionlessdata/jsontableschema-js#validate

So we could simplify API always raising list of errors for this functions (e.g. in ValidationErrors and CastErrors exceptions).

pwalsh commented 7 years ago

@roll

In the original code that spawned these libs, fail fast was used to stop iteration over a table on first error, as opposed to collecting all errors and returning all. This is about work speed, and is not just about a different API. Consider a table with 1GB of data and errors in the 100th row, and then other errors in 1 millionth row and beyond. With fail fast, the iteration will stop and raise almost immediately.

I don't know if the fail_fast flag is now applied in other places, but, for table iteration in jts, dp and good tables, it makes perfect sense IMHO.

roll commented 7 years ago

@pwalsh Yes but now we have iter concept where fail_fast seems not applicable. This thing is not fully clear for me for now. There is also - https://github.com/frictionlessdata/tabulator-py/issues/98

Some thinking needed)

But I suppose fail_fast in descriptor validation is not really important concept. We just need to find the best API to show errors. JS and Ruby libs just get all errors as a list. It seems handy.

pwalsh commented 7 years ago

in descriptor validation it is not relevant at all, agreed.

in table iteration, I don't see why iter concept makes any difference (in the original implementation it was the same).

roll commented 7 years ago

@pwalsh Let's sync on this when we will be on next tabulator/jts iteration. I've added issue after working on https://github.com/frictionlessdata/datapackage-js/issues/13 - to have impl in sync

roll commented 7 years ago

COVERED in separate issues

frictionlessdata / tableschema-py

Remove fail_fast where it's not needed #135

Overview

160 - castRow

113 - validate