JordanMilne / Advocate

An SSRF-preventing wrapper around Python's requests library. Advocate is no longer maintained, please fork and rename if you would like to continue work on it.
Other
92 stars 17 forks source link

Strange behavior with RequestsAPIWrapper.head() #5

Closed Deimos closed 7 years ago

Deimos commented 7 years ago

Hey Jordan,

I ran into some weird behavior in advocate today. I haven't tried to dig into the cause, but it seems to be mostly just related to the head() function for some reason. We have a RequestsAPIWrapper instance that's defined like this:

advocate = RequestsAPIWrapper(AddrValidator(
    ip_whitelist=g.scraper_ip_whitelist,    
))                                          

Using this for advocate.head(url) behaves strangely when used on our S3 thumbnail buckets, even though it seems to work if I just use advocate.request('HEAD', url) instead. Here's some testing that should show what I mean:

RequestsAPIWrapper version

>>> from r2.lib.media import advocate
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [400]>
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg').content
'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>BadRequest</Code><Message>Insufficient information. Origin request header needed.</Message><RequestId>1CB3BBB8D0A65360</RequestId><HostId>eno2W4lV4rPy1ZpQbpybvxtXSFrcTmxcjyrdVSf8/ongUZBGJKnZMRtv5eQhVJDto6LxxOHCSD8=</HostId></Error>'

>>> advocate.get('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>

>>> advocate.request('HEAD', 'https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>

Standard advocate, no RequestsAPIWrapper

>>> import advocate
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>

Standard requests library

>>> import requests
>>> requests.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>

Let me know if you need any more information or anything.

Deimos commented 7 years ago

(sorry if you already saw this, but just going to specifically ping you in case you weren't expecting anyone else to submit issues)

@JordanMilne

JordanMilne commented 7 years ago

Thanks for the ping @Deimos, I took a look at the code and looks like I made a copy/paste error. When we wrap the head method in RequestsAPIWrapper we're currently doing:

        self.options = self._default_arg_wrapper(options)
        self.head = self._default_arg_wrapper(options) # wrapping options instead of `head`!
        self.post = self._default_arg_wrapper(post)

I'm guessing you're getting Insufficient information. Origin request header needed. because S3 assumes that all OPTIONS requests are for CORS preflights and is confused when it doesn't see the Origin header.

I'll patch this and push an update to PyPI this afternoon.

Deimos commented 7 years ago

Yep, that'd definitely do it (and would explain why I was getting strange CORS-ish-sounding errors when I tried manually adding an Origin header to see what would happen).

JordanMilne commented 7 years ago

This is fixed in 0.6.2 on PyPI:

In [1]: import advocate

In [2]: advocate.__version__
Out[2]: '0.6.2'

In [3]: wrapper = advocate.RequestsAPIWrapper(advocate.AddrValidator())

In [4]: wrapper.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
Out[4]: <Response [200]>