Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.18k stars 519 forks source link

<title type="text">&lt;hello-world&gt;</text> #722

Open da2x opened 5 years ago

da2x commented 5 years ago

Example Atom feed (excerpt):

<entry>
  <title type="text">&lt;hello-world&gt;</title>
</entry>

Expected:

Actual: get the expected result in list views but not in previews.

I wrote up more details about this issue here: https://www.ctrl.blog/entry/xml-title-xml-entity-html.html

https://tools.ietf.org/html/rfc4287#section-3.1.1

screenshot

https://web.archive.org/web/20190605024702/https://feed.ctrl.blog/latest.atom

vincode-io commented 5 years ago

We had to back this out as it had unintended side effects. See issue #288 and #743.

vincode-io commented 5 years ago

@brentsimmons To fix this we are going to have to start capturing and storing the type associated with the title (text, html, or xhtml). That way we can know if we should be HTML escaping the title when it is embedded in HTML or not.

I checked the Feedbin API and it doesn't supply this attribute, so we can only fix it for the local account.

This will touch every part of the system from the parser to the database to the article renderer. I just wanted you to be aware of the scope of the work before I did it.

brentsimmons commented 5 years ago

I want to think about this some more. We can ship 5.0 without deal with it. (It’s so super-rare.)

da2x commented 5 years ago

To fix this we are going to have to start capturing and storing the type associated with the title (text, html, or xhtml).

Or you can store it in a normalized format. Read it in the supplied format, normalize it, and store that.

It’s so super-rare.

Text is the default text mode for the Atom format. It can’t be all that rare as most will stick with the defaults.

vincode-io commented 5 years ago

I suppose we could start treating everything in the title field as HTML and escaping any atom feed that comes in as the text type to HTML. We could then render the Timeline label as an attributed string and use the NSAttributedString to convert the HTML to the attributed string.

It isn't a simple fix. It still impacts the parser and the Timeline layout and rendering code. Because we would be switching from using NSTextField to NSTextView for rendering, how we calculate number of lines in the timeline would change.

We still can only fix it for local accounts unless the syncing service supplies us the title type.

Still, that is a better solution than passing the title type through the system.

Text is the default text mode for the Atom format. It can’t be all that rare as most will stick with the defaults.

I think he was referring to the frequency of people using html tags as content in titles.

paulrobertlloyd commented 4 years ago

Unsure if precisely related to this issue, but just come across a feed item whose title includes HTML (It All Starts with a Humble <textarea>), and NetNewsWire (Version 5.0.3 (2618)) renders it like this:

Screenshot 2019-12-08 at 14 26 17

Feed URL: https://feeds.feedburner.com/24ways

brentsimmons commented 4 years ago

This is a result of an ambiguity in the RSS spec. Are titles to be treated as plain text or as HTML? It can lead to weird stuff like this.

If we treat it as plain text, we have one class of bugs; if we treat it as HTML, we have another. Sigh.

da2x commented 4 years ago

This is a result of an ambiguity in the RSS spec.

There is no ambiguity in the Atom spec, though. It defaults to type=text and can optimally be set to type=html/xhtml.

Update: there’s actually no ambiguity in the RSS spec either.

Wevah commented 4 years ago

Looks like this is mostly fixed by #2016; attribute-less inline styling tags (i, strong, etc.) are still rendered, and we still don't actually treat type="html" any differently. If this still causes display issues, it'd certainly be worth looking into handling type="html".

da2x commented 3 years ago

Here’s another example from NetNewsWire 6.0:

Feed XML:

<title type="html">&lt;code translate=no&gt;systemd&lt;/code&gt; application firewalls by example</title>

Text rendered as:

<code translate=no>systemd application firewalls by example

Expected (with substring “systemd” optionally displayed with a fixed-width font where applicable):

systemd application firewalls by example
k-nut commented 6 months ago

I think I just observed the inverse of the original problem. In the atom feed https://rachelbythebay.com/w/atom.xml, there currently is a title element as follows:

<title>1 &lt;&lt; n vs. 1U &lt;&lt; n and a cell phone autofocus problem</title>

Note that this server uses pretty strict rate limiting, so I'm also attaching the file here for easier debugging.

NetNewWire currently renders the text like this:

CleanShot 2024-02-26 at 15 27 54@2x

The author intended the text to be rendered as1 << n vs. 1U << n and a cell phone autofocus problem.

Note though how in the timeline everything after the first < gets dropped and in the article view a > is added at the end which does not belong there.

atom.xml

michaelnordmeyer commented 4 weeks ago

<entry><title type="xhmtl"> is also buggy, because the title won't be displayed in NNW 6.1.4 macOS on Sonoma 14.5.

The feed https://idiomdrottning.org/blog currently has

<entry>
  <title type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
      <a href="https://idiomdrottning.org/best-world">Help! We’re stuck in the best of all possible worlds!</a>
    </div>
  </title>

and validates as Atom, and is also as an example in the Atom spec regarding XHTML text elements, but without the link.

My expectation is, that in the list and post view of NNW the title is displayed as "Help! We’re stuck in the best of all possible worlds!", but no title is displayed at all.

If I remove the anchor tag in this one title tag, the title is still not displayed, but the before missing author is, but not just for the one entry I edited, but for all entries.

@brentsimmons You removed this bug from milestones. Does this mean that NNW is in low-maintenance mode?

brentsimmons commented 4 weeks ago

NetNewsWire is in the mode of adopting Swift 6 strict structured concurrency, which overshadows everything else for now. Hopefully this will be finished over the next month or so.

michaelnordmeyer commented 2 weeks ago

I don't know how NNW handles the many problems in this issue internally, but IMHO the list view and the meta data display of the article should behave the same.

I also don't know if the XML and Atom parsing in NNW is separate, but the encoding of < and > in <title type="text">&lt;hello-world&gt;</title> is there for not breaking the XML. After the XML parser parsed the title, it does have a title with textual content of <hello-world>, which by the Atom standard should be displayed as it is. Other examples:

A missing type attribute sets the default text type, as was mentioned in this thread before.

I know that most feed creating apps and feed readers get some things wrong, unfortunately most of them get different things wrong differently. The currently most broken one is python-feedgen 3.0, which creates <content type="cdata"> Atom feeds.