digital-preservation / csv-validator

CSV Validation Tool and API (CSV Schema RI)
http://digital-preservation.github.io/csv-validator
Mozilla Public License 2.0
202 stars 54 forks source link

Bounded error collection #175

Open marksteele opened 5 years ago

marksteele commented 5 years ago

Would it be possible to implement an error collector that short circuits after a configurable number of errors (instead of the all-or-nothing approach). Ex: fail after 100 errors (and return the error messages).

This is useful when validating large files which might have lots of errors and avoids OOM issues.

Alternatively, it would be nice to collect the first N errors, then possibly statistics on the total number of errors (eg: found 500 validation errors on column A, 355 on column b, etc...)

adamretter commented 5 years ago

It would seem like a good idea in the Java API to replace both the resultant List<FailMessage> and ProgressCallback with a single callback mechanism, which is notified on each validation, and can then either return a flag or throw an exception to indicate that validation should stop.

I don't have any time to implement this personally, but if someone is interested, I could suggest a design...

alexgreenDP commented 5 years ago

Happy to hear a design and we'll add it to our backlog.