Gzip errors on broken pages

eshad / httplib2

Automatically exported from code.google.com/p/httplib2

0 stars 0 forks source link

Gzip errors on broken pages #49

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Fetching the url
http://imaddicted.ca/ebooks/so-i-just-bought-the-foxit-eslick-reader/
2.
3.

What is the expected output? What do you see instead?
FailedToDecompressContent exception is raised instead of content returned.
The site has a WP Super Cache plugin installed, which breaks the gzipped
content by adding a HTML comment at the end. The patch attached solves that
by retrying fetch without using compression.

What version of the product are you using? On what operating system?
r276

Please provide any additional information below.

Original issue reported on code.google.com by zelo.z...@gmail.com on 9 Mar 2009 at 9:16

Attachments:

httplib2-gziperrors-retry.diff

GoogleCodeExporter commented 9 years ago

I think the page in question has been fixed, but the error was that the page was
gzipped, but a Wordpress extension added a HTML comment after the gzipped 
content.

Original comment by zelo.z...@gmail.com on 17 Mar 2009 at 12:45

GoogleCodeExporter commented 9 years ago

Yes, some Wordpress extensions will add unzipped content after the zipped 
content,
which makes the entire body unzippable. In your application code you can catch 
the
error and retry the request after setting an Accept-Encoding: header that 
doesn't
include gzip, such as:

   h.request(uri, headers={'accept-encoding': 'identity'})

Not sure if such functionality should be folded into the core library, but I'm
willing to listen to arguments.

Original comment by joe.gregorio@gmail.com on 16 Jul 2009 at 4:36

Changed state: WontFix

GoogleCodeExporter commented 9 years ago


The web as we know it is definitely broken in many ways. The browsers have 
adopted to
that and they (apparently) automatically retry request with gzip disabled, 
since the
browser knows how to display the content. I think that including this might 
improve
the experience, but to not loosen the control, an option could be added to set 
retry
values.

However, it seems that this is more of a political issue than technical and I 
may not
understand correctly what the goal of httplib2 is.

Original comment by zelo.z...@gmail.com on 11 Aug 2009 at 1:39

GoogleCodeExporter commented 9 years ago

I have problems with a server that sends the wrong Content-Encoding header no 
matter what I do (gzip, but gets me FailedToDecompressContent).
Setting a custom Accept-Encoding doesn't help here, so I think httplib2 should 
handle that internally, by just not trying to unzip.

Original comment by julian....@gmail.com on 2 Aug 2010 at 6:46