Munter / hyperlink

A node library and command line tool to test the integrity of your internal an external hyperlinks
231 stars 24 forks source link

Follows links to canonicalRoot instead of resolving as local #158

Closed Munter closed 5 years ago

Munter commented 5 years ago

The following HTML

<a href="https://mntr.dk">waat</a>

Checked with this command line:

hyperlink -ri --canonicalroot https://mntr.dk --root . index.html

Results in this TAP output:

TAP version 13
# Crawling internal assets
ok 1 load index.html
ok 2 load https://mntr.dk
not ok 3 load static/bundle.e4f5761693.css
  ---
    operator: load
    expected:
      "200 static/bundle.e4f5761693.css"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/bundle.e4f5761693.css'"
    at: https://mntr.dk (1:2819) <link rel="stylesheet" href="/static/bundle.e4f5761693.css">
  ...
not ok 4 load static/bundle-1.74ee7145ce.css
  ---
    operator: load
    expected:
      "200 static/bundle-1.74ee7145ce.css"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/bundle-1.74ee7145ce.css'"
    at: https://mntr.dk (1:2914) <link rel="stylesheet" href="/static/bundle-1.74ee7145ce.css" integrity="sha256-RaWVNaKNpPwo3fei7Cy7ZVOJbyKdZZOze5mWdWJildU=">
  ...
not ok 5 load feed.xml
  ---
    operator: load
    expected:
      "200 feed.xml"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/feed.xml'"
    at: https://mntr.dk (1:4894) <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml">
  ...
not ok 6 load static/logo-white.0b1467f089.svg
  ---
    operator: load
    expected:
      "200 static/logo-white.0b1467f089.svg"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/logo-white.0b1467f089.svg'"
    at: https://mntr.dk (1:5053) <img src="/static/logo-white.0b1467f089.svg">
  ...
not ok 7 load static/web-share.0d5ae2348f.js
  ---
    operator: load
    expected:
      "200 static/web-share.0d5ae2348f.js"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/web-share.0d5ae2348f.js'"
    at: https://mntr.dk (1:15127) <script src="/static/web-share.0d5ae2348f.js" async="" integrity="sha256-M+JrvP+ihAv2Lm9ojTdA2j03E34+HhQSHkHWuILaYPE=">...</script>
  ...
ok 8 load 
ok 9 load https://fonts.googleapis.com/css?family=Noto+Serif:400,700,400i|Open+Sans:700,400
# Connecting to 2 hosts (checking <link rel="preconnect" href="...">
ok 10 preconnect-check https://fonts.googleapis.com
ok 11 preconnect-check https://fonts.gstatic.com
# Looking up 0 host names (checking <link rel="dns-prefetch" href="...">

1..11
# tests 11
# pass  6
# fail  5

It looks like hyperlink follows the link because it is a match for the canonical root, but instead of resolving to index.html on the local disk, it seems to load the content from https://mntr.dk and keeps running the checks from that page. Only the online deployed index page has the links to those hashed file names. The canonical link resolution makes hyperlink look for the hashed files on local disk.

Any ideas on this one @papandreou ?

papandreou commented 5 years ago

It seems to go wrong when the target of the <a href="https://mntr.dk"> is added to the graph. This check comes out as false: https://github.com/assetgraph/assetgraph/blob/89ff06a882142006da8d3f4014308aba8c51b2f4/lib/AssetGraph.js#L242

Because the canonical root has been normalized to https://mntr.dk/, and https://mntr.dk does not start with that because of the missing trailing slash. It seems like we should add normalize asset urls so that there's always a slash after the hostname (since https://mntr.dk really is the same url as https://mntr.dk/).