HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

HTTP redirect to HTTP 304 causes SAXParseException #322

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Call "parse(.)" with etag and modified parameters on a url that performs a 
HTTP 302 redirect and whose redirection target returns a HTTP 304.
2.
3.

What is the expected output? What do you see instead?
This yields an XML parse exception. I would expect default 304 behavior instead.

What version of the product are you using? On what operating system?
Python 2.7.2, feedparser 5.1

Please provide any additional information below.
This seems to happen because the redirect handler of _FeedURLHandler writes 
"code" into result.status. I patched the corresponding line so that 
"result.status = result.code". This solves this issue for me but I don't know 
whether it has any side effects.

Original issue reported on code.google.com by rehn.thomas on 15 Jan 2012 at 10:22

GoogleCodeExporter commented 9 years ago
Fixed in r688. Thanks for reporting this!

(Using result.code means that only the final return code (typically 200) will 
be kept, which causes intermediary redirect codes like 301 and 302 to be lost. 
With the patch I just committed the redirect status code will be kept but the 
final behavior will be the same as a 304 Not Modified, and no SAXParseException 
will be thrown.)

Original comment by kurtmckee on 17 Feb 2012 at 5:04

GoogleCodeExporter commented 9 years ago
version : feedparser 5.1.2, python 2.6.1

Here the result of the parse function for an url which performs a HTTP 302 
redirect and return an HTTP 304 code : 

{
'feed': {}, 
'status': 302, 
'debug_message': 'The feed has not changed since you last checked, so the 
server sent no data.  This is a feature, not a bug!', 
'version': u'', 
'encoding': u'iso-8859-1', 
'bozo': 1, 
'headers': {'date': 'Thu, 20 Dec 2012 16:07:30 GMT', 
'set-cookie': 'MF2=u2pv6i414uvx; domain=.feedsportal.com; expires=Sat, 
20-Dec-14 16:07:30 GMT; path=/', 
'connection': 'close', 
'server': 'FeedsPortal'}, 
'href': u'http://rss.feedsportal.com/c/499/f/413824/index.rss', 
'entries': [], 
'bozo_exception': NonXMLContentType('no Content-type specified',)
}

In my opinion, the status would be 304 and no bozo_exception. So we would 
handle the HTTP 304 code correctly.

Original comment by christophe.borsenberger on 20 Dec 2012 at 4:25