gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.
MIT License
1.57k stars 198 forks source link

ignore_files no longer works with regex #743

Closed jamespohalloran closed 2 years ago

jamespohalloran commented 2 years ago

After the 4.2.0 update, ignore_files no longer seems to be working with a regex (I was previously using --file-ignore)

Usgae

Running things currently on 4.3.0

arguments: --only_4xx --checks "Links,Images,Scripts,Favicon,OpenGraph" --ignore_status_codes "400, 403, 409, 429" --allow_hash_href --ignore_files "/.+\/(blog|404)\/.+/"

Result:

You can see it's still failing on a /out/blog/... file, which should be ignored: https://github.com/tinacms/tinacms.org/runs/7541078705?check_suite_focus=true#step:7:18

gjtorikian commented 2 years ago

I was surprised to hear this broke since nothing changed around these options in 4.3.0. I even added a test case matching your regex and couldn't reproduce the error. Could you try replacing the underscores with dashes, eg --ignore-files instead of --ignore_files? And you're certain that ignore_files regex worked in < 4.3.0?

riccardoporreca commented 2 years ago

@jamespohalloran, the issue is with the sequence

--allow_hash_href --ignore_files "/.+\/(blog|404)\/.+/"

Since --allow-hash-href should be used with an actual value, e.g. --allow-hash-href=false or --allow-hash-href false, --ignore_files is interpreted as the value (ultimately interpreted as boolean true). This was indeed changed in 4.3.0 for many options via 565bdd9158b0a20153e0633c81ab860dccb405c2.

The following would fix the issue (options should be --separated, although they seem to work also with _):

--allow-hash-href=true --ignore-files "/.+\/(blog|404)\/.+/"

@gjtorikian, I just confirmed locally, and this type of issues would be captured if the option was defined with boolean type (any of TrueClass/FalseClass should do), something I had looked into in https://github.com/gjtorikian/html-proofer/issues/735#issuecomment-1193225683, e.g.

  p.option 'allow_hash_href', '--allow-hash-href=<true|false>',  'TrueClass', 'If `true`, assumes `href="#"` anchors are valid (default: `true`)'

With this, you get e.g.

bundle exec htmlproofer --allow-hash-href --ignore-files "/.+\/(blog|404)\/.+/"
# [...]
# mercenary/program.rb:33:in `go': invalid argument: --allow-hash-href --ignore-files (OptionParser::InvalidArgument)

confirming also that --ignore-files is used as value for --allow-hash-href. This also does not require the explicit conversion from string values.

gjtorikian commented 2 years ago

Argh, yeah, it might be time for me to change the CLI parser. It's been on my mind today (cc https://github.com/gjtorikian/html-proofer/issues/742#issuecomment-1197397086), but I won't be able to put work into it until next week.

riccardoporreca commented 2 years ago

@gjtorikian, yes it might be a good idea as it currently looks tricky to define and maintain the arguments defining a robust CLI.

If this might be taking a bit of time and you think it is worth to quickly make sure issues like the one above are detected (which potentially affect anyone using some of changed options in 4.3.0), I am happy to help out with a small PR introducing the boolean type. Otherwise of course just advising and making sure users are always including =true/false should be enough for the time being.

jamespohalloran commented 2 years ago

Since --allow-hash-href should be used with an actual value, e.g. --allow-hash-href=false or --allow-hash-href false, --ignore_files is interpreted as the value (ultimately interpreted as boolean true). This was indeed changed in 4.3.0 for many options via https://github.com/gjtorikian/html-proofer/commit/565bdd9158b0a20153e0633c81ab860dccb405c2.

The following would fix the issue (options should be --separated, although they seem to work also with _):

Aha! That explains why I could never disable the check-internal-hash check in this issue: https://github.com/gjtorikian/html-proofer/issues/725

Thank you! Appreciate all the help, and this project!

gjtorikian commented 2 years ago

Closing this specific issue out but still searching for a Ruby CLI parser that's more intelligent.