ViennaRSS / vienna-rss

Vienna is a free and open-source RSS/Atom newsreader for macOS.
https://www.vienna-rss.com
Apache License 2.0
1.85k stars 227 forks source link

content:encoded preferred over description even when empty, leading to empty body #1788

Closed mhnestler closed 3 months ago

mhnestler commented 3 months ago

Describe the bug When an RSS item has both a <description> and a <content:encoded> element, the content:encoded element is always used as the article body. But, in some feeds, content:encoded is present but empty, and description has actual content (e.g. a summary). This results in the item appearing to have no body content in Vienna.

In general the existing behavior makes sense because the "RSS Best Practices" recommends that the use of a publisher including both would be to store the full article in content:encoded. Given that RSS feeds are the wild west and that document is not really a standard (nor does it dictate what reader applications should do in this case), I suggest that Vienna should handle this case better.

To Reproduce Subscribe to this feed: https://feeds.a.dj.com/rss/RSSWorldNews.xml (Wall Street Journal World News) and look at the entries.

Attached is a cached version of this feed. folder55.xml.txt

Excerpt:

<item>
<title>S&amp;P 500 Futures, Bond Yields Rise After Jobless Claims</title>
<link>https://www.wsj.com/articles/nikkei-may-fall-as-volatile-trading-continues-0c56c834</link>
<description><![CDATA[Stock futures turned higher after weekly jobless claims data fell to a lower-than-expected 233,000.]]></description>
<content:encoded/>
<pubDate>Thu, 08 Aug 2024 09:01:00 -0400</pubDate>
<guid isPermaLink="false">WP-WSJ-0001963860</guid>
<category domain="AccessClassName">PAID</category>
<wsj:articletype>NewsPlus</wsj:articletype>
</item>

Screenshots

Screenshot 2024-08-08 at 10 28 27 AM

Please complete the following information:

Additional information:

Relevant source code: https://github.com/ViennaRSS/vienna-rss/blob/8dae3483fe1ae9f5d1a514d76351c3490d969f15/Vienna/Sources/Parsing/RSSFeed.m#L230-L236

Eitot commented 3 months ago

At the least, the value of the articleBody variable shouldn't be overwritten with an empty string, if the content:encoded element is empty. I will submit a pull request.