danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Handles broken RSS feeds that may not include an <rss> declaration. #179

Open kopertop opened 7 years ago

kopertop commented 7 years ago

An example feed is: http://stocknewsnow.com/feed/. The feed is a semi-valid RSS feed, it's just missing the global declaration.

danmactough commented 7 years ago

Wow. I'm glad you were able figure out what was broken about that feed. But I'm not sure it's reasonable to expect a parser library to handle a feed that is so fundamentally broken. No complaints at all about how you implemented this. And thank you for the PR. I'm going to close this for now, but I'm open to discussion.

kopertop commented 7 years ago

What about an option to allow a user to define the feed type if none are found? Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?

danmactough commented 7 years ago

What about an option to allow a user to define the feed type if none are found?

That sounds doable.

Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?

This is probably already doable by hooking into the SAX events emitted, but I've been thinking about adding more hooks for precisely this kind of thing. Something like:

var feedparser = require("feedparser")();

stream.pipe(feedparser).pipe(outputHanlder);
feedparser.on("start-node", function (content) {
if (content.match(/^<channel/) {
this._isRequiresRssClosing = true;
return "<rss>" + content;
}
else {
return content;
}
});
feedparser.on("end-node", function (content) {
if (this._isRequiresRssClosing) {
return content + " </rss>";
}
else {
return content;
}
});

That's rather painfully explicit (and I probably wouldn't send plain text but rather a parsed node), but you get the picture.

kopertop commented 7 years ago

I'd prefer to see an option to provide "defaults" such as the default parser type, but anything that works would be fine. For now I'm just using my Fork of Feedparser, since my goal of using this library really was to support any feed, no matter how terribly broken it may be.

I deal with almost 20k feeds, most of which are pretty fundamentally broken.