gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.
MIT License
1.57k stars 196 forks source link

"ERROR: Invalid predicate" on ugly Maven search URL #810

Closed nchammas closed 7 months ago

nchammas commented 7 months ago

On this page is the following link:

For an up-to-date list, please refer to the Maven repository for the full list of supported sources and artifacts.

Checking this with a locally built version of that page yields the following error:

`evaluate': ERROR: Invalid predicate: 
  //*[@name="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]
  |/*[@name="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""]
  |//*[@id="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]
  |//*[@id="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""] (Nokogiri::XML::XPath::SyntaxError)

I assume this should be handled more gracefully somehow since the link does appear to be valid HTML and works for me in Safari.

For reference, the full trace is:

bundler: failed to load command: htmlproofer (.../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer)
.../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:238:in `evaluate': ERROR: Invalid predicate: //*[@name="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]|/*[@name="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""]|//*[@id="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]|//*[@id="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""] (Nokogiri::XML::XPath::SyntaxError)
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:238:in `xpath_impl'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:219:in `xpath_internal'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:182:in `xpath'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:159:in `check_hash_in_2xx_response'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:93:in `response_handler'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:78:in `block in queue_request'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:146:in `block in execute_callbacks'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:145:in `each'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:145:in `execute_callbacks'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/operations.rb:35:in `finish'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/easy_factory.rb:170:in `block in set_callback'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `block in complete'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `each'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `complete'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:189:in `check'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:202:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:50:in `perform'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/hydra/runnable.rb:15:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/hydra/memoizable.rb:51:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:69:in `run_external_link_checker'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:31:in `validate'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:146:in `validate_external_urls'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:97:in `check_files'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:50:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/cli.rb:22:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/exe/htmlproofer:14:in `block in <top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/3.3.0/benchmark.rb:313:in `realtime'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/exe/htmlproofer:14:in `<top (required)>'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer:25:in `load'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer:25:in `<top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:58:in `load'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:58:in `kernel_load'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:23:in `run'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:492:in `exec'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:34:in `dispatch'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:28:in `start'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/exe/bundle:37:in `block in <top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/exe/bundle:29:in `<top (required)>'
        from .../.rbenv/versions/3.3.0/bin/bundle:25:in `load'
        from .../.rbenv/versions/3.3.0/bin/bundle:25:in `<main>'
riccardoporreca commented 7 months ago

@nchammas, I could reproduce the error (with HTMLProofer 5.0.8) as follows:

htmlproofer --as-links "https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22"

Not sure HTMLProofer can do much to make this special full URL checkable, since the error is in fact coming from nokogiri.

Still, HTMLProofer would allow you to ignore the parts of the URL that are causing issues with --swap-urls, e.g.

htmlproofer  --swap-urls "search\.maven\.org/#search.*:search.maven.org" --as-links "https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22"

You find full documentation for this in the README.

Hope this helps

riccardoporreca commented 7 months ago

To be precise, the error is caused by HTMLProofer trying to use nokogiri to infer whether #search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22 is a valid hash for https://search.maven.org/.

However, this is not meant to be a content hash but used to provide some query parameters processed via JavaScript, and this is not something HTMLProofer/Nokogiri can do much about.

Therefore, you would need to instruct HTMLProofer that this is not a hash to be checked, which is indeed what --swap-urls above would do by stripping #search...

Closing this issue