Open GoogleCodeExporter opened 9 years ago
I've just run into this with a URL that redirects to a 404 error -- you can't
tell that it's an error from feedparser's result.
This is actually a regression: feedparser SVN r291 returned 'status': 404, and
current Git HEAD returns 'status': 301 (with the results otherwise being
identical). Having done a bisection, the behaviour changed in patch
"[63b82303ae173c007426f8ae4d33e94cbe3b7411] Fix HTTP redirection in Python 3"
-- before that change, the redirection handlers in _FeedURLHandler only set the
status if it wasn't already set.
(Maybe it'd be useful to return the response codes for both the initial and
final requests?)
Original comment by ats-goog...@offog.org
on 21 Jun 2013 at 4:54
Two patches attached. The first just corrects the regression; the second adds a
redirect_status attribute that also returns the (first) redirect's status.
Original comment by ats-goog...@offog.org
on 21 Jun 2013 at 5:54
Attachments:
After further investigation, the original behaviour in the case of a redirect
was actually to return the final request's status if it's an error code (i.e.
not 2xx), otherwise to return the first redirect's status. My first patch
above'll do that, but the description is wrong.
However, even that behaviour isn't sufficient, because it's also useful for an
aggregator to be able to distinguish different non-error codes behind a
temporary redirect -- e.g. 200 vs. 206 for RFC 3229 "A-IM: feed" support. So
how about sticking with the current meaning for "status" (the initial
response's status), and adding a "final_status" attribute giving the final
response's status?
Any thoughts welcome...
Original comment by ats-goog...@offog.org
on 21 Jun 2013 at 11:13
Caller should not have to worry about intermediate statuses if it doesn't care.
Status that gets returned in status field should be final status. If there are
referrals and you want to return them, then have a separate variable containing
the chain of statuses in order, with additional information about the referrals.
Original comment by jkam...@quantopian.com
on 21 Jun 2013 at 11:20
It turns out that it's quite straightforward to capture the complete chain of
HTTP responses using a urllib2 handler, so I'm now doing it that way and not
using 'status' at all. Example attached.
Original comment by ats-goog...@offog.org
on 24 Jun 2013 at 9:57
Attachments:
Original issue reported on code.google.com by
jikam...@gmail.com
on 13 Feb 2013 at 2:39