gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.
MIT License
1.57k stars 196 forks source link

Reporting broken links when using relative internal links reported as broken #792

Closed bnorberg closed 1 year ago

bnorberg commented 1 year ago

Hi, I am running into an issue where HTML Proofer is reporting working relative links as broken.

Example link

<a href="../admin/configuring-load-balancer-endpoints.html">

Example feedback from HTML Proofer

Screenshot 2023-02-25 at 11 05 53 AM

Any suggestions on what could be causing the broken link message?

My only solution to ensure I don't get false positives on relative links is to add '../' links in the ignore-urls option or changing all links to root relative. Neither solution is ideal.

gjtorikian commented 1 year ago

Hm, I can't reproduce this issue. Are you sure configuring-load-balancer-endpoints.html exists? Could you zip up your output for me to examine it, or is there some other way I can reproduce it?

bnorberg commented 1 year ago

Thanks, for the quick response, gjtorikian.

That page does exist because the link works on the deploy site. Here's the page in the gh-pages branch of the deploy, https://github.com/NetAppDocs/storagegrid-116/blob/gh-pages/admin/configuring-load-balancer-endpoints.html and here is the page that is reference the above link through the relative https://github.com/NetAppDocs/storagegrid-116/blob/gh-pages/admin/configuring-custom-server-certificate-for-storage-node-or-clb.html. Again this is just one example. We are getting the broken link on all .. relative links. If we change <a href="../admin/configuring-load-balancer-endpoints.html"> to <a href="./admin/configuring-load-balancer-endpoints.html"> or <a href="/admin/configuring-load-balancer-endpoints.html">, the link checker will not return a broken link message.

We are using a mix of private and public GitHub build runners and different deployment environments for site builds so it will probably not be possible to replicate what I am seeing.

Is there a possibility that this could be related to a site prefix causing the .. in the relative link to not actually be in the correct directory to find the linked file?? We use a language prefix on our site.

Thanks again.

gjtorikian commented 1 year ago

Sorry, but this is probably going to be a bit of back and forth. 😓

There are already a few tests that check to make sure relative (../) links works; I've just pushed a new test up that tries to mimic your issue and it, too, passes: https://github.com/gjtorikian/html-proofer/commit/bd8911119df6568382a6b002104811913ffd1a85

But as I said there are many tests for this. One thing I noticed is that the GitHub page you link to for configuring-load-balancer-endpoints.html has 0 links for ../admin/configuring-load-balancer-endpoints.html, but it does have links for /us-en/storagegrid-116/admin/configuring-load-balancer-endpoints.html. Is this expected?

riccardoporreca commented 1 year ago

@gjtorikian, are the tests also covering the case of running on a directory instead of an individual file?

By looking at https://github.com/gjtorikian/html-proofer/blob/83a948241ff59102325566afa082b9471aea946f/lib/html_proofer/attribute/url.rb#L135-L160

It seems the logic could be different in the two cases since for a "file" check I think @runner.current_source is the same as @runner.current_filename.

Looking at the code, there is perhaps an issue with the following lines, where it would seem more reasonable to use File.dirname(@runner.current_filename) instead of @runner.current_source in File.expand_path :

https://github.com/gjtorikian/html-proofer/blob/83a948241ff59102325566afa082b9471aea946f/lib/html_proofer/attribute/url.rb#L144-L148

bnorberg commented 1 year ago

No worries, @gjtorikian. I totally understand and really appreciate @riccardoporreca's and your help.

We are running the check on a directory. And we do add a prefix during the build process to the the root url of the page for the language code (ie, us-en) and the name of the repository. That is why you are seeing root url like /us-en/storagegrid-116 and not ../admin/configuring-load-balancer-endpoints.html.

I am so sorry. I probably confused you with a bad relative link example on the previous post. This would be a relative link example for the configuring-load-balancer-endpoints.html page that is not working , <a href="../admin/web-browser-requirements.html">.

Maybe this prefix is the problem. When a writer uses a relative link to another file in that repository (ie, ../admin/web-browser-requirements.html") the prefix might be throwing off where in the directory the check is actually running. And maybe @riccardoporreca suggestion can fix this?

Anyway, thanks again for your help.

riccardoporreca commented 1 year ago

@bnorberg, I did a quick test myself and the "directory" check seem to support correctly "../" links.

Any chance you can share even a small part of the actual website you run on and see the failures, which as I understand is not the public repo you are pointing to? On the public repo, html-proofer has no issue with ../admin/web-browser-requirements.html (I ran on a local checkout and there is no error on that link). Also important is sharing the options you call html-proofer with, as well as the version you are using. It is hard to investigate without a way of reproducing the error.

bnorberg commented 1 year ago

@riccardoporreca @gjtorikian thanks again for testing and confirming all's working for you.

We have Ruby 2.6 in our build environment and so are using HTML Proofer version 4.4.3. Here is a gist of our configuration, https://gist.github.com/bnorberg/f22875beb1c5fc4cdee7f7db91f3ee72. You'll see that right now we add ../ links to the ignore-urls options to avoid getting failure reports. We wrap HTML Proofer into a homegrown application and run it as a utility in a component step of a GitHub Actions workflow.

Unfortunately, I cannot share any content from the content repository in which we are seeing the failures because it is pre-public release. But the repo I have shared already, https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages, is very similar content-wise. We build our public sites using GitHub free Ubuntu build runners and GitHub pages. For our internal content, we have private build runners (built identical to Ubuntu GitHub build runners), that deploy to VM behind a firewall.

I will do some testing on our public content with relative links turned on tomorrow to see if this might be a problem limited to our internal environment setup and report back. Thanks again.

riccardoporreca commented 1 year ago

Thanks @bnorberg, going in the right direction.

Are you able to reproduce the failure on https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages with a similar configuration, of course without ignoring ../?

If you can provide us with another gist, which you could use at your end to reproduce the issue on https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages, we then have a starting point to reproduce at our end and investigate. Fort this exercise, you may want to specify disable_external: true and run with only checks: ['Links'] to speed things up.

While you do that, it would be great if you could check the error also occurs with the latest v5 html-proofer.

riccardoporreca commented 1 year ago

@gjtorikian, @bnorberg, I had an intuition by looking at the way the error is reported in the issue's description at the very top: html-proofer is actually looking for a link without leading ../, paired with my special interest for the html-proofer caching.

With this in mind, I could reproduce the issue even with the same test data introduced in https://github.com/gjtorikian/html-proofer/commit/bd8911119df6568382a6b002104811913ffd1a85 by just enabling the cache.

The issue is with cleaned_url, which was introduced in #686 consistently for the external URLs

https://github.com/gjtorikian/html-proofer/blob/83a948241ff59102325566afa082b9471aea946f/lib/html_proofer/cache.rb#L92-L103

however introducing an inconsistency for the internal URLs, where cleaning is not done when adding to the cache, and should not be done as it seems to strip leading ../.

I have extended the test and am ready for a simple fix

gjtorikian commented 1 year ago

Amazing sleuthing @riccardoporreca, thank you so much.

Released as 5.0.5.

bnorberg commented 1 year ago

@gjtorikian, @riccardoporreca I created a branch with the latest html proofer release and can confirm that this change does solve our relative link error issue. Thank you so much for your effort and amazing response time. You gentlemen are awesome!