Munter / netlify-plugin-checklinks

Netlify build plugin check your links and asset references
BSD 3-Clause "New" or "Revised" License
88 stars 9 forks source link

`xml:base` in Atom feeds does not appear to be respected #566

Open numist opened 2 years ago

numist commented 2 years ago

I'm using jekyll-postfiles to keep content (like images) local to the post, and the local references work with a feed reader thanks to an xml:base attribute on the content tag emitted by jekyll-feed.

Unfortunately those links are failing per netlify-plugin-checklinks:

9:37:55 PM:   ✖ FAIL load _site/f001.jpg
9:37:55 PM:   | operator: load
9:37:55 PM:   | expected: 200 _site/f001.jpg
9:37:55 PM:   |   actual: ENOENT: no such file or directory, open '/opt/build/repo/_site/f001.jpg'
9:37:55 PM:   |       at: _site/feed.xml:7:14 (inlined Html) <img src="f001.jpg" alt="A photo of the circuit board with component F001 circled (near the unpopulated twin inductors)">

Is this more of a hyperlink problem, or is it HTML-only?

numist commented 2 years ago

For anyone running into this in the future, add to your netlify.toml:

[[plugins]]

 package = "netlify-plugin-checklinks"
   [plugins.inputs]
   skipPatterns = [
     "_site/feed.xml",
   ]

main.css is also giving me grief for some yet-undetermined reason, so you might want to add that too if you're using checkExternal = true like I am.

Munter commented 2 years ago

@numist

Sounds like the missing inclusion of xml:base in the link resolution belongs in Assetgraph either in the RSS or the Atom asset type.

Could you create a reduced test case with one html-file that links to a feed, which links to an image in this manner, so we can add it to the Assetgraph test cases and base a path upon it?

The CSS issue you mention lacks a bit too much information for me to act upon. But you are welcome to open another issue with it so we can have a look if you found a bug, or if our error messages are maybe just too cryptic :P

numist commented 2 years ago

I just added an exception for all css links for now, I'll open a new issue with a reduced example when I restyle my site.

The below will need some massaging to fit your test suite's architecture but here's some reduced code from my own website. Both files validate, the html with https://validator.w3.org/nu/ and the xml with https://validator.w3.org/feed/.

feed.xml:

The important things to modify for testing here are xml.feed.id (self-referential URI), xml.feed.entry.content[xml:base] (baseurl, obv), and xml.feed.entry.content.p.img[src] (relative path to test image from baseurl). Probably xml.feed.entry.id should point to something valid? I would have left it out, but it's required by Atom.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" >
  <updated>2022-04-27T09:26:27+00:00</updated>
  <id>https://numi.st/feed.xml</id>
  <title type="html">my cool website title</title>
  <entry>
    <title type="html">my cool entry title</title>
    <updated>2022-04-21T00:00:00+00:00</updated>
    <id>https://numi.st/post/2022/travel-uke</id>
    <content type="html" xml:base="https://numi.st/post/2022/travel-uke/">
      <![CDATA[<p><img src="IMG_1232.jpeg" /></p>]]>
    </content>
    <author><name>Not Blank</name></author>
  </entry>
</feed>

index.html:

Obv html.head.link[href] needs to point at the xml file above

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>my site title</title>
    <link type="application/atom+xml" rel="alternate" href="https://numi.st/feed.xml" />
  </head>
  <body></body>
</html>
Munter commented 2 years ago

@papandreou I'm pretty sure we have the correct modeling of an inline HTML-fragment in an atom <content> block. I didn't know about the ability to set xml:base though. Do you think we could map the xml:base tag to what I guess would have to be a new baseUrl setter in Html or possibly all the way up to Asset ?

https://github.com/assetgraph/assetgraph/blob/master/lib/assets/Html.js#L40-L51

papandreou commented 2 years ago

Html is already wired up to get the baseUrl from the superclass' baseUrl getter (and then possibly modify it if there's a <base href=...> in the HTML itself): https://github.com/assetgraph/assetgraph/blob/815ae4b44b30d004bd5fc247cd39c10523cd448e/lib/assets/Html.js#L41-L50

The superclass (Asset) will delegate to its first "non inline ancestor" when it's an inline asset: https://github.com/assetgraph/assetgraph/blob/815ae4b44b30d004bd5fc247cd39c10523cd448e/lib/assets/Asset.js#L556-L559

The challenge seems to be that you can have an xml:base attribute for each inline HTML snippet. I guess the easiest thing is to pick it up when resolving the relation here, add it as a baseUrl property to the to object, and then make sure that the baseUrl getter in Html supports that case also.