frictionlessdata / tableschema-py

A Python library for working with Table Schema.
https://frictionlessdata.io
MIT License
260 stars 41 forks source link

Feature/custom exc handler: Support optional custom exception handling in Table.iter() + Table.read() #259

Closed hjoukl closed 4 years ago

hjoukl commented 4 years ago

Support custom exception handling through optional Table.iter() / Table.read() argument.

Implements: #258

This pull requests aims to add an exc_handler option to Table.iter() + Table.read():

def iter(self, keyed=False, extended=False, cast=True,                   
         integrity=False, relations=False,
         foreign_keys_values=False, exc_handler=None):

Current behaviour will remain unchanged if this exc_handler option is not used.

The pull request contains this change plus some "support infrastructure":

I'm happy to discuss any changes that would be necessary to make this acceptable :-).

Some notes:


Please preserve this line to notify @roll (lead of this repository)

roll commented 4 years ago

@hjoukl I'm a little bit busy at the moment but I will have reviewed it in a week or so.

Thanks for your input!

hjoukl commented 4 years ago

I'm a little bit busy at the moment but I will have reviewed it in a week or so.

Great! I haven't yet added any documentation - which I'd of course do if this looks like an acceptable addition, along with anything else that needs fixing in the PR.

roll commented 4 years ago

iter() should really use a context manager to open the stream; I've deliberately not included this in the PR to not create a huge diff caused by the indentation change

Probably it's because somewhere we had to use it outside a context manager, like returning to the client. If it's possible we're, of course, better to wrap it into one

hjoukl commented 4 years ago

iter() should really use a context manager to open the stream; I've deliberately not included this in the PR to not create a huge diff caused by the indentation change

Probably it's because somewhere we had to use it outside a context manager, like returning to the client. If it's possible we're, of course, better to wrap it into one

I actually had it running and passing all unit tests with simply using the existing context manager support. I removed that to keep the diff smaller, so this could be changed later on.

hjoukl commented 4 years ago

So, I finally got round to adding documentation, sorry for the delay.

For custom exception handlers, I also slightly changed the schema.py error reporting for casting errors (error_data arg to custom exception handlers) to only include the uncastable fields instead of the full row, which is available anyway in the row_data arg. Tests reflect this change.

Some additional test_table.py tests to test the behavioural differences when using cast=True vs cast=False for iter()/read() have been added.

Please let me know if anything's missing or should be changed to further advance this!

Btw. not being a proper git hero I got pretty much stumped by your commit bringing some master changes into the feature/custom-exc-handler branch. I just couldn't push afterwards any more and ended up cherry-picking my latest changes into a fresh local clone. I suspect I would have needed some "git fetch" or "git pull --rebase" from my github server-side fork, for the branch? I did actually do git fetch upstream and git merge upstream master.

Greetings, Holger

hjoukl commented 4 years ago

Great :-) - I'll go ahead and close #258, then.

Thanks for providing us with tableschema-py!