HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

handle RFC822 dates with extraneous commas #327

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

>>> import feedparser
>>> f = 
feedparser.parse('http://radio.nationalreview.com/radioderb/radioderb.xml')
>>> f.entries[0].updated
u'Fri, 10 Feb, 2012 09:00:00 EST'
>>> f.entries[0].updated_parsed
>>> 

What is the expected output? What do you see instead?

>>> f.entries[0].updated_parsed
time.struct_time(tm_year=2012, tm_mon=2, tm_mday=10, tm_hour=14, tm_min=0, 
tm_sec=0, tm_wday=4, tm_yday=41, tm_isdst=0)
>>>

The feedparser library can't parse the date string, because it includes two 
extraneous commas. Removing the commas will let the library parse the date 
without problems. The current workaround is possible, although I'm not sure if 
any date parsing depends on the existence of commas, but your unit tests should 
be able to cover these situations:

>>> feedparser._parse_date(f.entries[0].updated.replace(',', ''))
time.struct_time(tm_year=2012, tm_mon=2, tm_mday=10, tm_hour=14, tm_min=0, 
tm_sec=0, tm_wday=4, tm_yday=41, tm_isdst=0)

It also works when just replacing the comma after the year, or in other words:

>>> feedparser._parse_date(u'Fri, 10 Feb, 2012 09:00:00 EST')
>>> feedparser._parse_date(u'Fri, 10 Feb 2012 09:00:00 EST')
time.struct_time(tm_year=2012, tm_mon=2, tm_mday=10, tm_hour=14, tm_min=0, 
tm_sec=0, tm_wday=4, tm_yday=41, tm_isdst=0)
>>> feedparser._parse_date(u'Fri 10 Feb 2012 09:00:00 EST')
time.struct_time(tm_year=2012, tm_mon=2, tm_mday=10, tm_hour=14, tm_min=0, 
tm_sec=0, tm_wday=4, tm_yday=41, tm_isdst=0)
>>> 

Related gPodder mailing list post:
http://www.freelists.org/post/gpodder/gpodder-doesnt-recognize-the-date

Original issue reported on code.google.com by th.perl@gmail.com on 12 Feb 2012 at 10:11

GoogleCodeExporter commented 9 years ago
"Comma after the year" -> "Comma after the month" (obviously)

Original comment by th.perl@gmail.com on 12 Feb 2012 at 10:12

GoogleCodeExporter commented 9 years ago
Patch and pull request:

https://github.com/kurtmckee/feedparser/pull/3

Original comment by th.perl@gmail.com on 12 Feb 2012 at 11:12

GoogleCodeExporter commented 9 years ago
Your timing is uncanny! About eight hours ago I completely ripped out the 
RFC822 date parser and replaced it with the date parser from my listparser 
project. I needed to fix issue 304 and I discovered that Mark copied code from 
the rfc822 module and put it in `_parse_date_rfc822()` unmodified (see r147). I 
don't know if feedparser was under the Python license back in 2005, but it 
isn't now, and that code had to go.

I have only four unit tests left to account for with these changes, and then 
I'll fix this bug and make a new release.

Original comment by kurtmckee on 12 Feb 2012 at 6:09

GoogleCodeExporter commented 9 years ago
Fixed in r684. As discussed in issue 328 I'll hold off on a release until I've 
accounted for the compatibility issue I created. *Shakes head and sighs*. Also 
I'm planning to get feedparser off of SVN after this next release. So much to 
do!

Original comment by kurtmckee on 13 Feb 2012 at 9:08