Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.45k stars 535 forks source link

All images broken for https://blog.wadetregaskis.com #4156

Closed wadetregaskis closed 1 year ago

wadetregaskis commented 1 year ago

From looking at the RSS feed source (via curl) everything looks fine, and I can fetch the referenced images just fine. The feed also works fine - images are shown correctly and reliably - in other RSS readers. Only NetNewsWire (that I've tested) fails to load all the images, showing the broken image symbol.

URL: https://blog.wadetregaskis.com/feed

wadetregaskis commented 1 year ago

Version 6.1.4 (6120).

Wevah commented 1 year ago

"Failed to load resource: Cancelled load to https://blog.wadetregaskis.com/wp-content/uploads/2023/11/The-Colonists-Research-screen.jpg because it violates the resource's Cross-Origin-Resource-Policy response header."

Looks like we need work around these restrictions; seems like there's some private API to do that, but of course that probably wouldn't be acceptable for the iOS app. :/

[Edit: It might be possible to do using public API, but I can't test in a custom NNW build anymore since I'm still on an older OS/Xcode version.]

Wevah commented 1 year ago

Of note: Vienna loads the images correctly, unless "Preview new browser" is enabled in the Advanced settings/preferences.

wadetregaskis commented 1 year ago

I did wonder if it were a CORP issue. How did you obtain that error message (I'd already checked the logging in the Console app, but saw nothing of use).

Is this a configuration problem on the server side? Or is there some workaround I can implement there?

Wevah commented 1 year ago

I enabled the web inspector for NNW (there's a hidden pref for it) and saw the message when I checked the resources tab/inspector console. I think you may be able to work around this issue by making sure the Cross-Origin-Resource-Policy header is set to cross-origin (or removed) for images referenced from the feed. (The origin of the rendered article is file:// šŸ˜•)

Edit: You may also need to set the Access-Control-Allow-Origin header; I'm not sure.

wadetregaskis commented 1 year ago

Thanks for looking into this for me!

I'm not able to muck with the CORP right now for weird and uninteresting reasons, but maybe next week I'll be able to. I'd prefer not to loosen the security there, though.

I'm a little fuzzy on the COR* stuff, but it sounds like it should in principle just work, but the embedded web browser in NetNewsWire isn't configured correctly? The origin should be blog.wadetregaskis.com (when loading inside NetNewsWire), since that's the origin of the feed URL. I wouldn't have thought I'd need to do anything different for the RSS feed as opposed to any other HTML on my siteā€¦?

brentsimmons commented 1 year ago

Bad news: I have a fix for this running on my machine. Itā€™s just a couple lines of code added/changed.

It sets the baseURL so that itā€™s the same as the permalink (which makes sense). So, for instance, if the article is ā€œThe Colonists,ā€ then the baseURL is https://blog.wadetregaskis.com/the-colonists/

But hereā€™s the bad news: this small change is terrible for performance. Going from article to article is usually fast in NetNewsWire ā€” almost instantaneous most of the time. With this fix in place, it takes at least 0.5 seconds, and often a couple seconds, to switch to a new article.

And thatā€™s even with very short, text-only articles. Itā€™s not an issue with loading the images.

All that I can figure is that setting the baseURL like this triggers some kind of weird WebKit behavior that just kills performance.

Soā€¦ Iā€™m not sure if thereā€™s something we can do about this.

wadetregaskis commented 1 year ago

That's a bummer.

If you set the barest domain instead - e.g. just https://blog.wadetregaskis.com - does that at least mean switching between articles in the same feed are fast again?

Presumably it's something to do with WebKit isolating different originsā€¦ is it a one-time cost or every time? If it appears to be the latter, is that because NNW is mutating the same WebKit view? What if it used (under the covers) a [cached] WebKit view per feed?

I figure there must be some way to eliminate (or hide) that delay, since of course Safari doesn't exhibit it.

If you want, you can share your patch-so-far with me and I'll see if I can find a solution.

brentsimmons commented 1 year ago

Thanks! Hereā€™s the change (in the mac-release branch):

DetailWebViewController.swift:330

Change webView.loadHTMLString(html, baseURL: ArticleRenderer.page.baseURL) to these two lines:

let baseURL = URL(string: rendering.baseURL)
webView.loadHTMLString(html, baseURL: baseURL)
brentsimmons commented 1 year ago

I think, by the way, that we shouldnā€™t be using ArticleRenderer.page.baseURL as the baseURL in the first place. If we canā€™t use the permalink, then baseURL should just be nil.

wadetregaskis commented 1 year ago

This is weirdā€¦ if the baseURL is https://blog.wadetregaskis.com, it's slow as you describe. If it's https://wadetregaskis.com, it is not.

wadetregaskis commented 1 year ago

It's trying to load three JavaScript files which don't exist (when blindly appended to the baseURL):

That's what causes the delay - these are page-load-blocking and it's going out over the network every time. The delay is variable because it depends on how long it takes to get back the 404.

brentsimmons commented 1 year ago

Good detective work! That sounds like something we could fix. (They shouldnā€™t be loaded separately.)

wadetregaskis commented 1 year ago

I'm working on a fix now. One approach is to just inline those files. I'm exploring if there's a way to keep them as local files (might be beneficial for caching inside WebKit?), which involves partially bypassing WebKit's local file access protectionsā€¦

brentsimmons commented 1 year ago

What Iā€™d like to do is provide them as WKUserScripts. This way (in theory) we could run the JavaScript code we need to run but disable JavaScript for article content.

wadetregaskis commented 1 year ago

Right, that's what I'm currently trying out.

wadetregaskis commented 1 year ago

Yep, that works nicely.

diff --git a/Mac/MainWindow/Detail/DetailWebViewController.swift b/Mac/MainWindow/Detail/DetailWebViewController.swift
index c365727c6..1011a2aba 100644
--- a/Mac/MainWindow/Detail/DetailWebViewController.swift
+++ b/Mac/MainWindow/Detail/DetailWebViewController.swift
@@ -96,6 +96,15 @@ protocol DetailWebViewControllerDelegate: AnyObject {
                userContentController.add(self, name: MessageName.windowDidScroll)
                userContentController.add(self, name: MessageName.mouseDidEnter)
                userContentController.add(self, name: MessageName.mouseDidExit)
+
+               for fileName in ["main.js", "main_mac.js", "newsfoot.js"] {
+                       userContentController.addUserScript(
+                               .init(source: try! String(contentsOf: ArticleRenderer.page.baseURL.appending(path: fileName,
+                                                                                                                                                                                        directoryHint: .notDirectory)),
+                                         injectionTime: .atDocumentStart,
+                                         forMainFrameOnly: true))
+               }
+
                configuration.userContentController = userContentController

                webView = DetailWebView(frame: NSRect.zero, configuration: configuration)
@@ -326,7 +335,7 @@ private extension DetailWebViewController {
                ]

                let html = try! MacroProcessor.renderedText(withTemplate: ArticleRenderer.page.html, substitutions: substitutions)
-               webView.loadHTMLString(html, baseURL: ArticleRenderer.page.baseURL)
+               webView.loadHTMLString(html, baseURL: URL(string: rendering.baseURL))
        }

        func scrollInfo() async -> ScrollInfo? {
diff --git a/Mac/MainWindow/Detail/page.html b/Mac/MainWindow/Detail/page.html
index ceaa13f7f..ee7c70af3 100644
--- a/Mac/MainWindow/Detail/page.html
+++ b/Mac/MainWindow/Detail/page.html
@@ -4,9 +4,6 @@
                <style>
                        [[style]]
                </style>
-               <script src="main.js"></script>
-               <script src="main_mac.js"></script>
-               <script src="newsfoot.js" async="async"></script>
                <script type="text/javascript">
                        document.addEventListener("DOMContentLoaded", function(event) {
                                processPage();

That's a bit hacky though - notably the try!. I'm not sure how best to handle that (nominally impossible) failure to load the built-in JavaScript files.

I can do a pull request if you like, or you can take that patch and refine it yourself, if that'll be easier.

wadetregaskis commented 1 year ago

Presumably this will have further ramifications, that should be considered. e.g. now feed entries will be able to load all remote resources, including CSS, JavaScript, etc. There might be some extra protections you want in place before applying this fix.

wadetregaskis commented 1 year ago

ā€¦that said, I guess they could already do that anyway, just by using absolute URLs instead of base URLs. So no worries, I guess.

brentsimmons commented 1 year ago

Thanks so much! A PR would be great.

brentsimmons commented 12 months ago

The fix for this is in a Mac beta: https://github.com/Ranchero-Software/NetNewsWire/releases/tag/mac-6.1.5b1