Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.37k stars 531 forks source link

Unable to find feed on site generated by Hugo #4136

Open ned21 opened 1 year ago

ned21 commented 1 year ago

I see #3878 and #3534 reporting issues with relative URLs within feeds so I am wondering if there is a similar issue with locating feeds from HTML in Version 6.1.4 (6120)?

My site is generated using the Congo theme for Hugo and has this code in the header, but NNW is unable to detect the feed when given https://www.toobusyto.org.uk/ctt/ as the input.

<link rel="canonical" href="https://www.toobusyto.org.uk/ctt/">
<link rel="alternate" type="application/rss+xml" href="/ctt/index.xml" title="Too busy to...">

Error message: Feed not found. Can’t add a feed because no feed was found.

Having said that, I tried changing the second line to:

  <link rel="alternate" type="application/rss+xml" href="https://www.toobusyto.org.uk/ctt/index.xml" title="Too busy to..." />

and it still doesn't work so my guess it's a relative URL issue is likely wrong.

(Thank you so much for making and maintaining this fantastic app!)

stuartbreckenridge commented 1 year ago

@ned21 Can you (as a test) remove this line from the <head> element of your site?

<link rel="alternate" type="application/json" href="/ctt/index.json" title="Too busy to..." />

I can add the direct https://www.toobusyto.org.uk/ctt/index.xml to NetNewsWire without issue. However, what's happening is that when adding https://www.toobusyto.org.uk/ctt/ to NetNewsWire it's favouring the index.json feed, which isn't valid; therefore, if you remove that link from your site, it should fall back to xml.

(lldb) po bestFeedSpecifier.urlString
"https://www.toobusyto.org.uk/ctt/index.json"
ned21 commented 1 year ago

@stuartbreckenridge Thanks for getting back to me so quickly. I have removed the application/json line and confirm that feed detection now works correctly and I can add the feed.

When you say the JSON isn't valid, is this a spec violation in what Congo produces? I can turn off the generation of index.json but there's a note saying the json is required for all theme components to work correctly so I'd need to test if I am using any of them. Is there a defined spec for json that Congo should be following? Or is this one of those ill-defined areas of web specification where clients need to be relaxed in what they accept and the only "fix" is to workaround this in NNW?

stuartbreckenridge commented 1 year ago

To be clear, when I said the JSON isn't valid I meant in terms of the JSON feed spec, which is an alternative to RSS.

It appears that your Hugo theme is generating another JSON file that NNW wants to treat as a feed.

ned21 commented 1 year ago

Thanks, that's exactly the thing. I could raise a bug with Congo but I suspect that they may point to this note from August 2020 and claim that there is no other MIME type for their usage?

For version 1.1, we’re starting to move to the more specific MIME type application/feed+json. Clients that parse HTML to discover feeds should prefer that MIME type, while still falling back to accepting application/json too.

Could NNW's preferences be adjusted to prefer feed+* over plain json? i.e. Change the order of looking for a valid feed to be:

  1. application/feed+json
  2. application/feed+rss
  3. application/json

That would, I think, maintain backwards compatibility but ensure people using application/json for non-feed purposes do not encounter issues?

ned21 commented 1 year ago

Looking at the code, I think the issue might be that while index.xml appears first, because the url doesn't contain rss it doesn't get the +5 score boost, and the second url contains the string json so it gets +6, leaping it ahead of index.xml, despite the ordering.

stuartbreckenridge commented 1 year ago

@ned21 We aren't looking at the MIME type, only the URL specified.

@brentsimmons The weak point in the feed finder process is that once we've extracted potential feeds and scored them, NNW only attempts to parse the feed with the best score. In the scenario above, both https://www.toobusyto.org.uk/ctt/index.xml and https://www.toobusyto.org.uk/ctt/index.json were identified as potential feeds. The index.json feed received the higher score, became the best feed, and that was the only one that NNW attempted to parse in createRSSFeed, before throwing an error.

Feeds should be scored, and NNW should iterate over them all and stop at the first one that can be parsed.

stuartbreckenridge commented 1 year ago

@brentsimmons Some quick-and-dirty fleshed out thinking (tested with https://jpanther.github.io/congo/ which has this issue):

func createRSSFeed(for account: Account, url: URL, editedName: String?, container: Container, completion: @escaping (Result<Feed, Error>) -> Void) {

        // We need to use a batch update here because we need to assign add the feed to the
        // container before the name has been downloaded.  This will put it in the sidebar
        // with an Untitled name if we don't delay it being added to the sidebar.
        BatchUpdate.shared.start()
        refreshProgress.addToNumberOfTasksAndRemaining(1)
        FeedFinder.find(url: url) { result in
            switch result {
            case .success(let feedSpecifiers):
                let scoredFeeds = FeedSpecifier.scoredFeeds(in: feedSpecifiers)
                self.logger.debug("Identified the following scored feeds: \(scoredFeeds.map { $0.urlString })")
                var parsedFeed: ParsedFeed?
                Task { @MainActor in
                    for feed in scoredFeeds {
                        self.logger.debug("Attempting to add feed: \(feed.urlString)")
                        guard let url = URL(string: feed.urlString) else {
                            continue
                        }

                        if account.hasFeed(withURL: feed.urlString) {
                            self.refreshProgress.completeTask()
                            BatchUpdate.shared.end()
                            completion(.failure(AccountError.createErrorAlreadySubscribed))
                            self.logger.debug("Account has feed with \(feed.urlString)")
                            return
                        }

                        if let parsed = try? await InitialFeedDownloader.download(url) {
                            parsedFeed = parsed
                        } else {
                            self.logger.error("Unable to add feed: \(feed.urlString)")
                        }

                        if let parsedFeed = parsedFeed {
                            let feed = account.createFeed(with: nil, url: url.absoluteString, feedID: url.absoluteString, homePageURL: nil)
                            feed.editedName = editedName
                            container.addFeed(feed)

                            account.update(feed, with: parsedFeed, {_ in
                                BatchUpdate.shared.end()
                                completion(.success(feed))
                                self.logger.debug("Successfully added feed: \(feed.url)")
                            })
                            self.refreshProgress.completeTask()
                            return
                        } else {
                            continue
                        }
                    }

                    self.refreshProgress.completeTask()
                    BatchUpdate.shared.end()
                    if parsedFeed == nil {
                        completion(.failure(AccountError.createErrorNotFound))
                    }
                }
            case .failure:
                BatchUpdate.shared.end()
                self.refreshProgress.completeTask()
                completion(.failure(AccountError.createErrorNotFound))
            }
        }
    }

Then we see output like this:

Identified the following scored feeds: ["https://jpanther.github.io/congo/index.json", "https://jpanther.github.io/congo/index.xml"]
Attempting to add feed: https://jpanther.github.io/congo/index.json
Unable to add feed: https://jpanther.github.io/congo/index.json
Attempting to add feed: https://jpanther.github.io/congo/index.xml
Successfully added feed: https://jpanther.github.io/congo/index.xml

Or with DF:

Identified the following scored feeds: ["http://daringfireball.net/feeds/json", "http://daringfireball.net/feeds/main", "http://daringfireball.net/feeds/", "http://daringfireball.net/feeds/sponsors/", "https://www.rsspod.net/john-gruber", "https://www.rsspod.net/jason-snell", "https://www.rsspod.net/stephen-hackett", "https://www.linkedin.com/feed/update/urn:li:activity:7095517841457631232/", "https://www.rsspod.net/john-siracusa", "https://www.rsspod.net/marco-arment", "https://www.rsspod.net/casey-liss"]
Attempting to add feed: http://daringfireball.net/feeds/json
Successfully added feed: http://daringfireball.net/feeds/json