Open kopertop opened 7 years ago
Wow. I'm glad you were able figure out what was broken about that feed. But I'm not sure it's reasonable to expect a parser library to handle a feed that is so fundamentally broken. No complaints at all about how you implemented this. And thank you for the PR. I'm going to close this for now, but I'm open to discussion.
What about an option to allow a user to define the feed type if none are found? Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?
What about an option to allow a user to define the feed type if none are found?
That sounds doable.
Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?
This is probably already doable by hooking into the SAX events emitted, but I've been thinking about adding more hooks for precisely this kind of thing. Something like:
var feedparser = require("feedparser")();
stream.pipe(feedparser).pipe(outputHanlder);
feedparser.on("start-node", function (content) {
if (content.match(/^<channel/) {
this._isRequiresRssClosing = true;
return "<rss>" + content;
}
else {
return content;
}
});
feedparser.on("end-node", function (content) {
if (this._isRequiresRssClosing) {
return content + " </rss>";
}
else {
return content;
}
});
That's rather painfully explicit (and I probably wouldn't send plain text but rather a parsed node), but you get the picture.
I'd prefer to see an option to provide "defaults" such as the default parser type, but anything that works would be fine. For now I'm just using my Fork of Feedparser, since my goal of using this library really was to support any feed, no matter how terribly broken it may be.
I deal with almost 20k feeds, most of which are pretty fundamentally broken.
An example feed is: http://stocknewsnow.com/feed/. The feed is a semi-valid RSS feed, it's just missing the global declaration.