From david@kineticode.com on 2010-05-21 19:07:14
:
Here's a patch. I extracted some shitty dates from some feeds I'm parsing, plus threw in a bunch of others. The test ensures that they all work in pubDate, dc:date, dcterms:date dcterms:modified, and atom:updated. It adds dependencies on DateTime::Format::ISO8601, DateTime::Format::Flexible, and DateTime::Format::Natural.
I didn't add a parameter, as there don't seem to be any real attributes to use. Maybe I've missed something?
I've Cc'd RT so that it doesn't get lost in the shuffle.
What do you think?
Best,
David
On May 20, 2010, at 4:05 PM, David E. Wheeler wrote:
> Hi Simon,
>
> I'm using XML::Feed for a project. It's so nice not to have to worry about all the variations in feeds. Many thanks to you and SixApart for the great module.
>
> One place where I do have to worry, though, is with dates. There are a lot of feeds out there with invalid date formats. Take http://bestwebgallery.com/feed/ for example. It has this:
>
> <pubDate>May 17, 2010</pubDate>
>
> Irritating. I fully expect to find a lot more shitty dates. Alas, with a date like this, issued() returns undef. I'd really like to make a best effort to get at dates in all formats, as I could really use it for proper(ish) sorting.
>
> I noticed this test in t/01-parse.t:
>
> $feed = XML::Feed->parse('t/samples/rss10-invalid-date.xml')
> or die XML::Feed->errstr;
> $entry = ($feed->entries)[0];
> ok(!$entry->issued); ## Should return undef, but not die.
> ok(!$entry->modified); ## Same.
>
> So I guess that you want to be strict by default. So What I'm thinking is adding an attribute to XML::Feed to be looser when parsing dates. If it's set to true (false by default), then it would also try DateTime::Format::Natural or perhaps DateTime::Format::Flexible. Would you be interested in such a patch?
>
> If so, looking at Format::RSS, I see that it first tries {dc}{date} and then {PubDate}. Should I continue with that approach? Or maybe try both strict first, and then try them both again more loosely?
>
> Thanks,
>
> David
Migrated from rt.cpan.org#57730 (status was 'new')
Requestors:
Attachments:
From david@kineticode.com on 2010-05-21 19:07:14 :