athackst / htmlproofer-action

Run htmlproofer on a directory. Defaults work with Github Pages.
https://althack.dev/htmlproofer-action
Apache License 2.0
1 stars 0 forks source link

Links check is not run #31

Closed roman-vasylenko closed 1 year ago

roman-vasylenko commented 1 year ago

Hi,

Thanks for the action. It does not seem to check links. There is at least one link in HTML files in _site directory with a bad anchor which is not detected.

{:allow_missing_href=>true, :check_external_hash=>true, :checks=>["Images", "Scripts"], :ignore_empty_alt=>true, :enforce_https=>false, :hydra=>{:max_concurrency=>50}, :typhoeus=>{:connecttimeout=>30, :followlocation=>true, :headers=>{"User-Agent"=>"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.0.0 Safari/537.36"}, :ssl_verifypeer=>false, :ssl_verifyhost=>0, :timeout=>120, :cookiefile=>".cookies", :cookiejar=>".cookies"}, :ignore_urls=>["********", "********"], :swap_urls=>{/^\/*******/=>"", /^.**********=>""}}
Running 2 checks (Images, Scripts) in ["./_site"] on *.html files ...

Workflow output:

Checking 4 external links
Checking 0 internal links
Checking internal link hashes in 0 files
Ran on 53492 files!

HTML-Proofer finished successfully.

My config:

uses: athackst/htmlproofer-action@main
        with:
          directory: ./_site
          allow_missing_href: true
          empty_alt_ignore: true
          enforce_https: false
          check_favicon: false
          check_html: false
          check_opengraph: false
athackst commented 1 year ago

It looks like your config disables link checking

check_html: false disables the link checker.

roman-vasylenko commented 1 year ago

@athackst thanks for the quick reply :)

As I understand, check_html option is also responsible for checking HTML markup. In my case it increases the check time dramatically. I am not quite sure if this is possible, but it would be great to have the possibility to check links but disable HTML validation.

athackst commented 1 year ago

This action runs htmlproofer - if there's some setting combination that does what you want that I'm not supporting I could maybe add it.

This action really only exposes the functionality of that program. I can't support it here if it doesn't do what you want.

Maybe caching is what you'd want?

roman-vasylenko commented 1 year ago

Hi, @athackst

Thanks for the reply. The problem I am having is that there is a subdirectory under _site with thousands of HTML files. It is actually an automatically generated API documentation for our project. As an example, _site/apidoc. It generates an enormous number of errors and makes workflow run almost endlessly.

I am looking for a way to exclude the mentioned subdirectory from the check. As I can see there is no such specific parameter in HTMLProofer. However, I found that there is a way to provide an array of folders to run checks on via check_directories option https://github.com/gjtorikian/html-proofer#checking-directories

It will allow me to specify all subdirectories I want to run checks on except _site/apidoc

Can this be implemented?

roman-vasylenko commented 1 year ago

@athackst I found a solution for my particular situation. I created an alternative config.yml which excludes the needed directory from the Jekyll build during workflow run. Thank you for your time.