google-code-export / feedparser

Automatically exported from code.google.com/p/feedparser
Other
1 stars 0 forks source link

HTTP Last-modified documentation incorrect #369

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  import feedparser
2.  d = feedparser.parse('http://rss.cnn.com/rss/si_topstories.rss')
3.  d.modified

What is the expected output? What do you see instead?
Expected:  RFC 822 (4-digit year) in 9-digit tuple 
Actual:  'Sun, 05 Aug 2012 02:26:53 GMT'

What version of the product are you using? On what operating system?
5.1.2, Ubuntu 12.04LTS (Precise)

Please provide any additional information below.

Noticed there was a relatively recent change to parse_date_rfc822(), have not 
checked source code.

Docs page for Date Parsing:
http://packages.python.org/feedparser/date-parsing.html

Docs Page for Last-Modified:
http://packages.python.org/feedparser/http-etag.html

BTW - "See Also" link on that page is broken for HTTP Web Services 

Original issue reported on code.google.com by revisi...@gmail.com on 8 Aug 2012 at 7:13

GoogleCodeExporter commented 9 years ago
What do you mean with
>> Expected:  RFC 822 (4-digit year) in 9-digit tuple 

You can read the 9-digit tuple with "modified_parsed". Here a short example:

In [1]: import feedparser

In [2]: f = feedparser.parse('http://rss.cnn.com/rss/si_topstories.rss')

In [3]: f.modified
Out[3]: 'Fri, 07 Sep 2012 19:57:39 GMT'

In [4]: f.modified_parsed
Out[4]: time.struct_time(tm_year=2012, tm_mon=9, tm_mday=7, tm_hour=19, 
tm_min=57, tm_sec=39, tm_wday=4, tm_yday=251, tm_isdst=0)

Original comment by schla...@gmail.com on 7 Sep 2012 at 8:04

GoogleCodeExporter commented 9 years ago
I am following the on-line documentation.  Please follow and read the provided 
links as they describe my problem exactly.  You are recommending using 
something that I could only find documented in the release notes for v2.7.  

http://packages.python.org/feedparser/date-parsing.html

Second sentence on the page:
Universal Feed Parser will attempt to auto-detect the date format used in any 
date element, and parse it into a standard Python 9-tuple, as documented in the 
Python time module.

Bottom of same page, in the instructions for registering a third-party date 
handler:
The callback function should take a single argument, a string, and return a 
single value, a 9-tuple Python date in UTC.

Actual Code:
return (int(year), int(month), int(day), \
        int(hour), int(minute), int(second), 0, 0, 0)

From the other link I provided:
Please view the section (Last Modified Headers):
http://packages.python.org/feedparser/http-etag.html

Shows that f.modified should be a 9-digit tuple, not an 'un-parsed' RFC 822 
w/4-digit year.

Original comment by revisi...@gmail.com on 7 Sep 2012 at 8:51

GoogleCodeExporter commented 9 years ago
I was sure I fixed all of the documentation...sorry about that. This change was 
introduced to make the HTTP Last-Modified header follow the same dictionary key 
naming format as `f.feed` and `f.entries[i]` date-related keys.

`f.updated` contains the original HTTP Last-Modified string.
`f.updated_parsed` contains the 9-tuple you're looking for.

Original comment by kurtmckee on 19 Nov 2012 at 4:27

GoogleCodeExporter commented 9 years ago
Okay, after looking at the source I see that this has been fixed. The online 
docs will be updated with the next release. :)

Original comment by kurtmckee on 26 Nov 2012 at 5:02