alexdebril / rss-atom-bundle

RSS and Atom Bundle for Symfony
MIT License
139 stars 49 forks source link

$item->getUpdated() returns current DateTime if not present in RSS #68

Closed AlexanderMatveev closed 7 years ago

AlexanderMatveev commented 9 years ago

$item->getUpdated() returns current DateTime if not present in RSS For example, see http://feeds.feedburner.com/esquire-ru?format=xml

AlexanderMatveev commented 9 years ago

Any response please?

alexdebril commented 9 years ago

Well, I guess you cannot fix the feed yourself so we'll need to patch rss-atom-bundle to return a relevant date. Which we can't, because we only have few clews on when the item were actually published.

it's possible to set the feed's last modified HTTP header value, so if you think it's a good solution could you please submit a pull request in that way ?

Regards,

Alex

AlexanderMatveev commented 9 years ago

@alexdebril Thanks for response. But I can't fix feeds I'm parsing. Because they are third party feeds, and many of feedburner.com feeds don't have this tag at all. So after updating feeds using rss-atom-bundle old items received as new items. Is there any plans for solving this issue?

alexdebril commented 9 years ago

Then the only solution is to rely on the public id to filter items you already have in database. The Filter system is built for that, take a look at this interface : https://github.com/alexdebril/rss-atom-bundle/blob/master/Protocol/FilterInterface.php . Once you created your Filter class (PublicIdFilter for instance), you pass it as a parameter of getFilteredContent : https://github.com/alexdebril/rss-atom-bundle/blob/master/Protocol/FeedReader.php#L135

I hope it helped !

Alex

AlexanderMatveev commented 9 years ago

@alexdebril It's funny, but many feeds don't have Public ID field =D Is it possible to return null instead of current DateTime with $item->getUpdated(), if no pubDate tag is in Item? What is the logic of returning current DateTime?

AlexanderMatveev commented 9 years ago

@alexdebril The bundle doesn't parse valid feed (see https://validator.w3.org/feed/check.cgi?url=www.sovsport.ru%2Fnews_rss). Moved to https://github.com/eko/FeedBundle.

alexdebril commented 9 years ago

curl -I http://www.sovsport.ru/news_rss HTTP/1.1 200 Server: nginx/1.4.7 Date: Thu, 11 Jun 2015 21:12:56 GMT Content-Type: application/xml; charset=windows-1251 Connection: keep-alive

Usually the HTTP message ends with a message like :

curl -I http://php.net/feed.atom 7 ↵ HTTP/1.1 200 OK Server: nginx/1.6.2 Date: Thu, 11 Jun 2015 21:15:01 GMT Content-Type: application/atom+xml Content-Length: 299632 Connection: keep-alive Last-Modified: Thu, 11 Jun 2015 21:00:11 GMT ETag: "49270-5184446f72cc0" Accept-Ranges: bytes

(which is successfully parsed btw)

The cause is here : https://github.com/alexdebril/rss-atom-bundle/blob/master/Driver/HttpCurlDriver.php#L63

The regexp expects message to exist, which is the case in most cases (I've never seen HTTP/1.1 200 without OK before).

Try with https://github.com/alexdebril/rss-atom-bundle/tree/issue-68 : it works. The only difference is https://github.com/alexdebril/rss-atom-bundle/commit/9033ac5d728a0322b57c2d78bc848b7b800014ec

I'll fix that bug in the next release.

AlexanderMatveev commented 9 years ago

Thank you @alexdebril

alexdebril commented 7 years ago

Guzzle solved this, I close the issue