gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.
MIT License
1.57k stars 200 forks source link

Undesireable url_swap behavior #585

Open mazzystr opened 4 years ago

mazzystr commented 4 years ago

Our team is trying to use HTMLProofer to check links between two websites running locally in containers. We do this to verify links between our website and user documentation prior to publication. I'm seeing some undesirable behavior when using url_swap feature.

I'm using the following in the Rakefile

    desc 'Checks html files looking for dead links to userguide'
    task :test_userguide => :build do
        options = {
            :checks_to_ignore   => [ "ScriptCheck", "ImageCheck" ],
            :assume_extension   => true,
            :check_html         -> true,
            :only_4xx           => true,
            :allow_hash_href    => true,
            :enforce_https      => true,
            :check_external_hash => true,
            :log_level          => :debug,
            :external_only   => true,
            :url_swap           => {
                                    /blah.com\/myuri/ => "0.0.0.0:8000"
                                    },
            :url_ignore         => [
                                    /^https?:\/\/((?!(blah.com|0.0.0.0:8000)(\/myuri)?).*)/,
                                    ],
        }
        puts "Checking userguide links..."
        HTMLProofer.check_directory("./_site", options).run

The test file looks like the following...

- [1](http://blah.com/myuri/appendix/blah/#commands-1)
- [2](http://blah.com/myuri/appendix/blahhhhh/#commands-1)
- [3](http://blah.com/myuri/appendixxxx/blah/#commands-1)
- [4](http://blah.com/myuri/appendix/blah/#commands-666)

Debug output looks like the following...

Checking linkcheck on ./_site/index.html ...
Checking 7 external links...
ETHON: started MULTI
Received a 200 for http://0.0.0.0:8000  in ./_site/index.html
Received a 200 for http://0.0.0.0:8000/appendix/blah/#commands-1  in ./_site/index.html
Received a 200 for http://0.0.0.0:8000/appendix/blah/#commands-666  in ./_site/index.html
Received a 404 for http://0.0.0.0:8000/appendix/blahhhhh/#commands-1  in ./_site/index.html
Received a 404 for http://0.0.0.0:8000/appendixxxx/blah/#commands-1  in ./_site/index.html
ETHON: performed MULTI
Ran on 1 file!

- ./_site/index.html
  *  External link http://0.0.0.0:8000/appendix/blahhhhh/#commands-1 failed: 404 No error
  *  External link http://0.0.0.0:8000/appendixxxx/blah/#commands-1 failed: 404 No error
rake aborted!
HTML-Proofer found 2 failures!

It should would be nice to have the real link displayed and not the swapped url. At least provide a switch to enable the behavior.

gjtorikian commented 4 years ago

That makes sense, tagged as enchancement.

PS: if it so happens that Red Hat is relying on this project, please consider sponsoring. Thanks. ✌️