HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Accept header causing an infinite HTTP 302 loop #319

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Parse this feed directly 
url = 'http://feeds.soundcloud.com/users/2864591-tnwdaily/tracks'
feed = feedparser.parse(url)
Result:  feed is an empty library with the bozo bit flipped.

2. Download the feed to a string then parse again
req = urllib2.Request(url)
response = urllib2.urlopen(req)
sfeed = response.read()
feed2 = feedparser.parse(sfeed)

What is the expected output? What do you see instead?
Expected output:  feed is an empty library with the bozo bit flipped.
Actual output:  feed is parsed perfectly with no issue.

Original issue reported on code.google.com by tpay...@bt-software.net on 6 Jan 2012 at 7:34

GoogleCodeExporter commented 9 years ago
I used wget to prod the server and discovered that it's the Accept header that 
the server is choking on. In particular, the server responds in an infinite 
loop of HTTP 302 responses. The Accept header that feedparser currently sends 
is:

application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcd
f,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1

I don't understand the ramifications of this header, nor do I grok its syntax, 
so I'll have to dig into this a little deeper.

Original comment by kurtmckee on 10 Jan 2012 at 8:21

GoogleCodeExporter commented 9 years ago
After looking into the issue a little further I believe that feedparser is 
behaving just fine and that soundcloud.com is misconfigured. In particular, I 
found that if "application/atom+xml" is preferred over "application/rss+xml" 
their server issues an HTTP 302 redirect and enters an infinite loop. If 
"application/atom+xml" is preferred less than "application/rss+xml" -- or is 
not listed at all -- then their server responds as expected. This only occurs 
with the Atom mimetype, so apparently their server is looking for it 
specifically.

I've sent an email to SoundCloud's support team to notify them of the problem. 
To mitigate the problem on your side I recommend customizing the Accept header 
by changing the global variable `feedparser.ACCEPT_HEADER`. You can remove 
"application/atom+xml," at the beginning of the string if you're dealing with 
soundcloud.com.

If you want to contact SoundCloud, the email page I found was at:

http://help.soundcloud.com/customer/portal/emails/new

Original comment by kurtmckee on 11 Jan 2012 at 9:18