FreshRSS / FreshRSS

A free, self-hostable news aggregator…
https://freshrss.org
GNU Affero General Public License v3.0
10.03k stars 855 forks source link

SimplePie avoid resolving anchors URL #4481

Open Alkarex opened 2 years ago

Alkarex commented 2 years ago

Discussed in https://github.com/FreshRSS/FreshRSS/discussions/4476

Originally posted by **vincode-io** July 29, 2022 I've noticed that NetNewsWire footnote detection code doesn't work when using a FreshRSS account. It looks like the normal anchor link in the footnote gets changed to a fully resolved link. Is this intended behavior or is something else going on?
Alkarex commented 2 years ago

More details at https://github.com/Ranchero-Software/NetNewsWire/issues/3564#issuecomment-1182073331

Original:

<content type="html" xml:base="https://timotijhof.net/posts/2022/internet-archive-crawling/">
  <p>..<sup ..><a href="#fn:sse" class="footnote" rel="footnote">1</a></sup> ..

FreshRSS proxy:

<description>
  <p>.<sup ..><a href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn:sse" rel="footnote">1</a></sup> ..

Originally posted by @Krinkle in https://github.com/FreshRSS/FreshRSS/discussions/4476#discussioncomment-3293149

edent commented 1 year ago

I'm also seeing this on WordPress / Jetpack generated footnotes.

The Atom feed has:

<sup><a href="#fn-43921-simplified" class="jetpack-footnote" title="Read footnote.">1</a></sup>

But FreshRSS links out, rather than use an internal link.

Alkarex commented 1 year ago

Related to https://github.com/FreshRSS/FreshRSS/issues/4562 and https://github.com/FreshRSS/FreshRSS/pull/4565

Alkarex commented 1 year ago

After https://github.com/FreshRSS/FreshRSS/pull/4565 , at least the reference links are correct, although linking to the original URL.

Using a local anchor is not trivial to get right, in particular due to the risk of name collisions between articles, and between an article and the anchors used by FreshRSS. For instance, the examples I am seeing use #fn1, #fn2, etc. for footnotes, and as soon as we get two or more articles of this type, the footnotes would fail.

A solution might be a rewriting of the anchors, for instance a SHA of the full URL.

https://github.com/FreshRSS/FreshRSS/blob/543fa4e76c1761154801febd08400520b75143e3/lib/SimplePie/SimplePie/Sanitize.php#L541-L549

Krinkle commented 1 year ago

@Alkarex Aha, so I hadn't realized this before but is it fair to assume then that this rewrite is intentional and mainly (or solely?) for the frontend's use case of being able to expand multiple posts on-screen at once?

If yes, perhaps it would be an option to switch this feature via a parameter that only the frontend's own API calls set, with e.g. apps like NetNewsWire that use FreshRSS as a server, receiving the original feeds unchanged. It might prevent bugs in the future if transformations and formatting changed are generally limited to frontend API calls. Having said that, I've been using FreshRSS (server-only) for a long time and may very well have gotten used to features of benefits of the current transformations that I don't realise are there thanks to FreshRSS.

This is probably orthogonal in so far that it is still a possible feature request for FreshRSS' frontend to support inline footnotes within the page. My suggestion is basically an added cost in abstraction that might make sense if we think that separating these concerns is valuable, e.g. allow other frontends to prioritise and build features even if they aren't yet available or feasible in the FreshRSS frontend. Again, I withdraw all this if there are benefits to the transformations that other frontends would notice and/or wouldn't already do themselves as needing to support any RSS.

Frenzie commented 1 year ago

mainly (or solely?) for the frontend's use case of being able to expand multiple posts on-screen at once?

Not quite the right way to phrase it, but yes, I believe so. But note that as @Alkarex wrote, https://bleep#bloop is also very suboptimal there. It should be #bluup-bloop instead (where bluup is some kind of unique identifier, possibly based on bleep).