frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
189 stars 44 forks source link

use requests library for raw_iter, to support custom http_session #229

Closed OriHoch closed 5 years ago

OriHoch commented 5 years ago

This fixes a problem with loading data from urls which require a custom http session (e.g. for http auth)

While tabulator accepts http_session option to support this, the raw_iter method which is used to infer encoding doesn't

To fix it - changed the raw_iter to use requests library and optionally use the http_session option from table options

I think that in the long term we should have the encoding detection done in tabulator - to keep a consistent loading method

akariv commented 5 years ago

lgtm

roll commented 5 years ago

@OriHoch Thanks!

There is one thing I'm concerned about. You use requests.raw (http://docs.python-requests.org/en/master/user/quickstart/#raw-response-content). But as far as I can remember I failed to use it with tabulator because it doesn't decompress the stream. Not sure it has been changed for requests@3. I think we at least need to add a few tests to figure it out.