FreshRSS / FreshRSS

A free, self-hostable news aggregator…
https://freshrss.org
GNU Affero General Public License v3.0
9.85k stars 845 forks source link

[Bug] [edge] RSS <title> elements with Unicode not translated to human-readable. #6979

Open gnatbandanna opened 18 hours ago

gnatbandanna commented 18 hours ago

Describe the bug

I believe this to be an issue with Edge (1.24.x worked fine) and Lemmy RSS feeds (probably others). Lemmy feeds are coming across as:

<title>Polling the group: what do y&amp;#x27;all know about the Orion browser from Kagi?</title>

Which is then just displayed as is without translating to something more readable, such as ':

image

I suppose that's fine, but this wasn't an issue with the 1.24.x branch. The titles were properly parsed and displayed.

To Reproduce

Presumably add a feed like the one above and then look at the title with any client/browser to see the unicode jibberish.

Expected behavior

I'd expect something like [...] what do y'all know [...]

FreshRSS version

1.25.0-dev

Environment information

Additional context

No response

Alkarex commented 17 hours ago

Same as https://github.com/FreshRSS/FreshRSS/issues/754 , which was fixed by https://github.com/FreshRSS/FreshRSS/pull/813 and sent upstream as https://github.com/simplepie/simplepie/pull/400 but then reverted upstream as https://github.com/simplepie/simplepie/pull/433 , which made that we lost the fix during the big SimplePie refactoring https://github.com/FreshRSS/FreshRSS/pull/4374

However, the fix was probably not correct for all cases, so should be re-evaluated. https://validator.w3.org/feed/docs/warning/ContainsHTML.html https://www.rssboard.org/rss-profile#data-types-characterdata

In any case, the feed is not following the recommendations https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Flemmy.ml%2Ffeeds%2Fc%2Fprivacy.xml%3Fsort%3DActive

Alkarex commented 15 hours ago

Looking at this specific feed and bug, the original source is:

<title>Polling the group: what do y&amp;#x27;all know about the Orion browser from Kagi?</title>

(This means a double XML encoding)

Frenzie commented 15 hours ago

That definitely shouldn't be supported by default. ^_^

Alkarex commented 15 hours ago

Indeed, I have been looking at what could / should be done, and I do not believe this invalid feed can be be supported without breaking legitimate use cases in other valid feeds.

Valid representations should be:

<title>Polling the group: what do y'all know about the Orion browser from Kagi?</title>

<title>Polling the group: what do y&#x27;all know about the Orion browser from Kagi?</title>
Alkarex commented 15 hours ago

The bug should be reported to https://github.com/LemmyNet/lemmy/issues

gnatbandanna commented 14 hours ago

The bug should be reported to https://github.com/LemmyNet/lemmy/issues

Will do, thanks!!