davorg-cpan / xml-feed

The CPAN module XML::Feed
18 stars 22 forks source link

Re: XML::Feed Date Parsing [rt.cpan.org #57730] #42

Closed atoomic closed 5 years ago

atoomic commented 5 years ago

Migrated from rt.cpan.org#57730 (status was 'new')

Requestors:

Attachments:

From david@kineticode.com on 2010-05-21 19:07:14 :

Here's a patch. I extracted some shitty dates from some feeds I'm parsing, plus threw in a bunch of others. The test ensures that they all work in pubDate, dc:date, dcterms:date dcterms:modified, and atom:updated. It adds dependencies on DateTime::Format::ISO8601, DateTime::Format::Flexible, and DateTime::Format::Natural.

I didn't add a parameter, as there don't seem to be any real attributes to use. Maybe I've missed something?

I've Cc'd RT so that it doesn't get lost in the shuffle.

What do you think?

Best,

David

On May 20, 2010, at 4:05 PM, David E. Wheeler wrote:

> Hi Simon,
> 
> I'm using XML::Feed for a project. It's so nice not to have to worry about all the variations in feeds. Many thanks to you and SixApart for the great module.
> 
> One place where I do have to worry, though, is with dates. There are a lot of feeds out there with invalid date formats. Take http://bestwebgallery.com/feed/ for example. It has this:
> 
>       <pubDate>May 17, 2010</pubDate>
> 
> Irritating. I fully expect to find a lot more shitty dates. Alas, with a date like this, issued() returns undef. I'd really like to make a best effort to get at dates in all formats, as I could really use it for proper(ish) sorting.
> 
> I noticed this test in t/01-parse.t:
> 
>    $feed = XML::Feed->parse('t/samples/rss10-invalid-date.xml')
>        or die XML::Feed->errstr;
>    $entry = ($feed->entries)[0];
>    ok(!$entry->issued);   ## Should return undef, but not die.
>    ok(!$entry->modified); ## Same.
> 
> So I guess that you want to be strict by default. So What I'm thinking is adding an attribute to XML::Feed to be looser when parsing dates. If it's set to true (false by default), then it would also try DateTime::Format::Natural or perhaps DateTime::Format::Flexible. Would you be interested in such a patch?
> 
> If so, looking at Format::RSS, I see that it first tries {dc}{date} and then {PubDate}. Should I continue with that approach? Or maybe try both strict first, and then try them both again more loosely?
> 
> Thanks,
> 
> David
atoomic commented 5 years ago

seems a fix was submitted and merged via #40 we should be able to close this case after confirmation cc: @davorg

atoomic commented 5 years ago

this was merged via 46ff242de3a7d9c360fcb80d1842f593fabecb80

theory commented 5 years ago

It me.