ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
77 stars 154 forks source link

Use requests to download files instead of urllib2 AND extend SSL_VERIFY option to these requests (for testing) #145

Closed davidread closed 7 years ago

davidread commented 7 years ago

It seems a no-brainer to use requests instead of urllib2 for this key step of downloading the data file that gets pushed because requests is superior in lots of ways.

Since python 2.7.9, urllib2 went from not verifying certificates at all to verifying them against the OS's built-in list. My experience of the latter is that it goes out of date and even big sites like github.com start getting rejected. Requests handles this much better imo because it updates more frequently via the certifi python module (installed with requests[security]. I've used this in ckanext-archiver in a similar for several years with success, but successfully checking HTTPS certs.

I've checked the tests pass, but it's not tried in production or battle-hardened, so could do with double-checking.

davidread commented 7 years ago

Perhaps @metaodi this is something you might have a chance to review?

metaodi commented 7 years ago

@davidread this looks very good and is a step in the right direction. If you like I could try this code next week on one of my test instances, but I see currently nothing that could break.

davidread commented 7 years ago

Thanks for that, and trying it next week would be great!

On Thu, 31 Aug 2017 at 19:00, Stefan Oderbolz notifications@github.com wrote:

@davidread https://github.com/davidread this looks very good and is a step in the right direction. If you like I could try this code next week on one of my test instances, but I see currently nothing that could break.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ckan/datapusher/pull/145#issuecomment-326375404, or mute the thread https://github.com/notifications/unsubscribe-auth/AASxnEXinMzW8IQSVf5R2mPBT0k7rKPZks5sdvTTgaJpZM4PJH2l .

davidread commented 7 years ago

@metaodi I don't suppose...

metaodi commented 7 years ago

@davidread I knew I forgot something... Sorry!

metaodi commented 7 years ago

@davidread just ran into this again. Did you have a chance to fix the "typo" so we can merge this?

davidread commented 7 years ago

@metaodi Thanks for doing the testing and reminder. I've added your fix for the issue you spotted - much appreciated.

davidread commented 7 years ago

@smotornyuk is this something you'd be ok to merge for us?

davidread commented 7 years ago

Oh, hold on I have another change

davidread commented 7 years ago

Ok @smotornyuk, this is ready for review & merge, if that's ok?

metaodi commented 7 years ago

Oh wait, why would you not use SSL_VERIFY for the download? I actually would like this behaviour. I consider SSL_VERIFY useful for test envirionments with self-signed certificates. And this applies to both APIs and the download (e.g. for files uploaded to CKAN).

I'd very much prefer if you'd revert your last change.

davidread commented 7 years ago

Ah, that's a good point.

davidread commented 7 years ago

Ok, how's this?

metaodi commented 7 years ago

Perfect, thanks! 👍

davidread commented 7 years ago

@smotornyuk this is ready for review - do you have time or shall I ask others?

metaodi commented 7 years ago

@smotornyuk @davidread is there somebody who could review and/or merge this? Btw: I'd be happy to help here (all I need is write access :wink:)

amercader commented 7 years ago

@metaodi ask and you shall receive :)

Thanks for your help

amercader commented 7 years ago

Sorry if it was not clear, you have now write access @metaodi

metaodi commented 7 years ago

@amercader thank you! :tada: