danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

re item-level link elements inferred from guids? #293

Open scripting opened 2 years ago

scripting commented 2 years ago

Dan, I'm debugging something in FeedLand that has led me to (what I think is) weird behavior in feedparser.

I'm wondering if you are patterning this after the Python feedparser package...

https://pythonhosted.org/feedparser/reference-entry-link.html

Imho they were wrong to do this, and I think I have some standing there. ;-)

It's creating a real problem for me in a legit application of RSS that's hard to work around.

BTW, the feed in question is this --

http://data.feedland.org/feeds/davewiner.xml

The items do not have link elements, deliberately.

Just want to confirm that I correctly understand what feedparser is doing.

Thanks in advance..

danmactough commented 2 years ago

@scripting Do you mean this inference? 👉🏻 https://github.com/danmactough/node-feedparser/blob/master/lib/feedparser/index.js#L1104-L1108

scripting commented 2 years ago

@danmactough -- maybe that's it -- i just went ahead and documented the bug and moved on.

Curious if your feedparser is a port of the Python feedparser.

One other thing -- there's fallout from the way FP adds the non-item channel-level elements to the items. As far as I know that's the only way to get data from the top of the feed. But what happens when a feed has no item elements? That happened a few weeks ago with hilarious results. :-)

My workaround was to watch for no elements and then read the feed "manually" to get the top items.

Anyway as always thanks for a great package.

danmactough commented 2 years ago

Curious if your feedparser is a port of the Python feedparser.

It is definitely not a port. I was aware of Python feedparser when I created this library, but I hadn't used it except incidentally while hacking on an old open source podcast client that used it. That is the context in which I learned the appeal of "normalizing" the different feed formats.

One other thing -- there's fallout from the way FP adds the non-item channel-level elements to the items. As far as I know that's the only way to get data from the top of the feed. But what happens when a feed has no item elements?

You can disable that behavior with the addmeta option: https://github.com/danmactough/node-feedparser#options, like new FeedParser({ addmeta: false }). (Channel-level data is called meta -- admittedly not a great name choice.) There are couple other ways to get that channel-level information, both of which should work just fine even if there are no items. They are both in the README, but you can do something like:

Listen for the meta event

var meta;
someFeedAsAReadableStream.pipe(FeedParser(options))
      .on('error', function (err) {
        done(err);
      })
      // *** HERE ***
      .on('meta', function (_meta) {
        meta = _meta;
      })
      .on('readable', function () {
        var _item = this.read();
        item || (item = _item);
      })
      .on('end', function () {
        /***** meta WILL BE DEFINED AT THE END ****/
        done();
      });

OR

Use this.meta

someFeedAsAReadableStream.pipe(FeedParser(options))
      .on('error', function (err) {
        done(err);
      })
      .on('readable', function () {
        var _item = this.read();
        item || (item = _item);
      })
      .on('end', function () {
        var meta = this.meta;
        done();
      });
scripting commented 2 years ago

So if it isn't a port, where did you get the idea that you could add a link element to an item that doesn't have a link element?

The item-level link element is optional, I think the spec is pretty clear about that.

I see that kind of (honestly) bs all over the place, developers change things because in the moment, without consulting anyone or any docs, or prior art, decide to change the format, so it keeps degrading and supporting it gets harder.

All the work you had to do to write FP is the result of developers thinking they could rewrite the rules, instead of sticking to the format as defined, to maximize interop.

I don't know if this happens in HTTP or TCP or other protocols, if for some reason people have no respect for RSS.

It doesn't matter now, what's done is done. I'm gathering all the BS together in one package, reallySimple, and fixing the problems or documenting them. I asked you to be part of that process but you were too busy I guess. I'm going to close the issue now.

scripting commented 2 years ago

Dan, I just noticed that the workaround I put in FeedLand broke my linkblog feed.

I think the best approach is if we come up with a way to turn this feature off.

Is that something you’re open to?

danmactough commented 2 years ago

Dan, I just noticed that the workaround I put in FeedLand broke my linkblog feed.

I think the best approach is if we come up with a way to turn this feature off.

Is that something you’re open to?

Sure. I think adding an option that defaults to the current behavior would be fine. I'm also open to releasing a major version with the new behavior as the default.

scripting commented 2 years ago

First, thank you.

Make the default the current behavior, always. I'm happy just to be able to fix the bug in my software.

Your first priority is not breaking your users and this has already happened, you can't put the toothpaste back in the tube.

Thank you again, this is the right way to go.

scripting commented 2 years ago

Dan -- checking in. The sooner the option is added the better. Right now the linkblogs in FeedLand are broken. I really need to get them fixed.