gesomax / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

httplib2 fails to fetch Google web history RSS feeds #17

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
httlib2 seems to have issues fetching Google's "web history" RSS feeds::

   >>> import httplib2
   >>> h = httplib2.Http()
   >>> h.add_credentials("jacob.kaplanmoss", MY_PASSWORD)
   >>> resp, body =
h.request("https://www.google.com/history/lookup?month=1&day=3&yr=2007&output=rs
s")
   Traceback (most recent call last):

      ...
   httplib.BadStatusLine

(Note that this doesn't happen with fetching plain-old HTML since that uses
GoogleLogin instead of HTTP Basic.)

The problem doesn't appear to be something that Google's doing since
urllib2 works fine::

    >>> import urllib2
    >>> auth = urllib2.HTTPBasicAuthHandler()
    >>> auth.add_password("Google Search History", "www.google.com",
"jacob.kaplanmoss", MY_PASSWORD)
    >>> opener = urllib2.build_opener(auth)
    >>> resp =
opener.open("http://www.google.com/history/lookup?month=1&day=3&yr=2007&output=r
ss")
    # ^^^ works fine

This is against r264 of httplib2. Python 2.5, OS X 10.5.

Original issue reported on code.google.com by jacob.ka...@gmail.com on 7 Jan 2008 at 6:11

GoogleCodeExporter commented 8 years ago
Weird, though the exception is thrown from within httplib not httplib2, if we 
can
track down the problem we can always monkey-patch httplib to avoid the problem.
Trying to duplicate it now.

Original comment by joe.gregorio@gmail.com on 7 Jan 2008 at 6:17

GoogleCodeExporter commented 8 years ago
Ok, I can duplicate the bug, and it's weird. Definitely a bug on the Google 
servers
end, but a rather subtle one.

When you request the original URI it will generate a 401 challenge for Basic 
auth.
Once you resubmit the request with credentials the response changes to a 302 
with a
Location header. It seems once you are authorized the request URI is re-written 
to
include a "zx=" parameter and you are 302 redirected to that. Httplib2 sends the
authorization header along with that re-directed request (as it should 
according to
RFC 2617 since the request *path* isn't changed), but the service is choking on 
that
authorization header being present. If I re-submit the request to the rewritten 
URI
w/o authorization it works fine.

Not sure how to fix this, except to add a "auth_doesnt_follow_redirects" option 
to
httplib2.

Original comment by joe.gregorio@gmail.com on 7 Jan 2008 at 6:45

GoogleCodeExporter commented 8 years ago
Oh, that *is* weird.

Think I'll report the problem to Google (though I don't think that'll get 
anywhere).
httplib2 certainly shouldn't need to special-case something as silly as this.

Original comment by jacob.ka...@gmail.com on 7 Jan 2008 at 7:21

GoogleCodeExporter commented 8 years ago
I can report it internally, but after I get a fix in and really convince myself 
I
have a solution. Working on it now.

Original comment by joe.gregorio@gmail.com on 7 Jan 2008 at 7:25

GoogleCodeExporter commented 8 years ago
Ok, the *real* problem was even odder than I thought. It turns out the Location:
header didn't redirect to httpS://www.google.com it redirected to
http://www.google.com:443.
Of course httplib2 was looking at the scheme to figure out whether to use SSL 
or not.
So now I've added a test that if the scheme is http and if an explicit port is 
given
and if that port is 443 then treat the connection as an httpS connection. Not 
sure
the fix is completely spec compliant, but the only real place where it could 
break
was if you were running a non SSL server off port 443, and I think that's just 
asking
for trouble.

Fix is checked into trunk. 

Original comment by joe.gregorio@gmail.com on 7 Jan 2008 at 10:23

GoogleCodeExporter commented 8 years ago

Original comment by joe.gregorio@gmail.com on 6 Sep 2008 at 4:15

GoogleCodeExporter commented 8 years ago
Unfortunately this fix completely breaks fetching URLs with 'http' scheme and 
port
set to 443. And I have no ability to change this behavior w/o patching library 
sources...

IMO, no one can make decision based on port number and alter URL scheme. Or, 
may be,
in exceptional cases only. And only by setting corresponding library option.

May I expect that this httplib2 behavior will be changed in the near future or 
I'll
be gonna have to live with frozen patched version / search for alternative tool 
/
change port / something else?

Thanks.

Original comment by N.Bukha...@gmail.com on 7 Oct 2009 at 9:32