Closed asbjornu closed 2 years ago
@asbjornu, I was curious about this as I fear you might be hitting performance limitations with internal links checks due to some bottleneck I was looking at at some point.
I investigated by actually downloading the build-site
artifact and running locally.
I could easily run only on the index.html via
htmlproofer --disable-external true --log-level debug --checks Links build-site/index.html --root build-site/
which took >~5 minutes (on my laptop) but completed. Given the hundreds of files, it is not too hard to imagine you are genuinely hitting the 6 hours timeout. Based on previous investigations, I know a key bottleneck is create_nokogiri
when checking for existing hash
https://github.com/gjtorikian/html-proofer/blob/57ebcbab8009a6a70b6df25cd19a07d0212d34d5/lib/html_proofer/url_validator/internal.rb#L92
I could confirm this by running on the entire site w/o checking internal hashes `
htmlproofer --disable-external true --log-level debug --checks Links build-site/ --check-internal-hash false
where the check for internal links now completes in a reasonable time of ~4 minutes.
It might be worth for you to try using check_internal_hash: false
at your end to confirm this and at least have something working.
The issue could be mitigated by minimizing the repeated create_nokogiri
for the same target internal page, since this is currently re-done for each different hash within the same page, and for the same hash linked from different pages.
@gjtorikian, happy to help out here and draft something
Would it be possible to try out the repo build using this branch? https://github.com/gjtorikian/html-proofer/pull/766
What would be the easiest way to execute HTMLProofer from the command line on the native-async
branch, @gjtorikian? For completeness, I ran bundle exec htmlproofer _site
on the command line and it also hangs. Canceling it, I get the following stack trace:
Traceback (most recent call last):
40: from /Users/bitbear/gems/bin/bundle:23:in `<main>'
39: from /Users/bitbear/gems/bin/bundle:23:in `load'
38: from /Users/bitbear/gems/gems/bundler-2.3.22/exe/bundle:36:in `<top (required)>'
37: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/friendly_errors.rb:120:in `with_friendly_errors'
36: from /Users/bitbear/gems/gems/bundler-2.3.22/exe/bundle:48:in `block in <top (required)>'
35: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli.rb:25:in `start'
34: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
33: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli.rb:31:in `dispatch'
32: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
31: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
30: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
29: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli.rb:486:in `exec'
28: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli/exec.rb:23:in `run'
27: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli/exec.rb:58:in `kernel_load'
26: from /Users/bitbear/gems/gems/bundler-2.3.22/lib/bundler/cli/exec.rb:58:in `load'
25: from /Users/bitbear/gems/bin/htmlproofer:25:in `<top (required)>'
24: from /Users/bitbear/gems/bin/htmlproofer:25:in `load'
23: from /Users/bitbear/gems/gems/html-proofer-4.4.0/bin/htmlproofer:11:in `<top (required)>'
22: from /Users/bitbear/gems/gems/mercenary-0.4.0/lib/mercenary.rb:21:in `program'
21: from /Users/bitbear/gems/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in `go'
20: from /Users/bitbear/gems/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `execute'
19: from /Users/bitbear/gems/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `each'
18: from /Users/bitbear/gems/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `block in execute'
17: from /Users/bitbear/gems/gems/html-proofer-4.4.0/bin/htmlproofer:97:in `block (2 levels) in <top (required)>'
16: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/runner.rb:46:in `run'
15: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/runner.rb:95:in `check_files'
14: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/runner.rb:145:in `validate_internal_urls'
13: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:19:in `validate'
12: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:26:in `run_internal_link_checker'
11: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:26:in `each_pair'
10: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:27:in `block in run_internal_link_checker'
9: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:27:in `each'
8: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:40:in `block (2 levels) in run_internal_link_checker'
7: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:79:in `hash_exists?'
6: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/url_validator/internal.rb:92:in `find_fragments'
5: from /Users/bitbear/gems/gems/html-proofer-4.4.0/lib/html_proofer/utils.rb:22:in `create_nokogiri'
4: from /Users/bitbear/gems/gems/nokogiri-1.13.8-x86_64-darwin/lib/nokogiri/html5.rb:31:in `HTML5'
3: from /Users/bitbear/gems/gems/nokogiri-1.13.8-x86_64-darwin/lib/nokogiri/html5/document.rb:43:in `parse'
2: from /Users/bitbear/gems/gems/nokogiri-1.13.8-x86_64-darwin/lib/nokogiri/html5/document.rb:85:in `do_parse'
1: from /Users/bitbear/gems/gems/nokogiri-1.13.8-x86_64-darwin/lib/nokogiri/html5/document.rb:85:in `parse'
/Users/bitbear/gems/gems/nokogiri-1.13.8-x86_64-darwin/lib/nokogiri/xml/document.rb:172:in `initialize': Interrupt
proofer _site(45009,0x114210e00) malloc: *** error for object 0x7f926073ef90: pointer being freed was not allocated
proofer _site(45009,0x114210e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
@riccardoporreca, with check_internal_hash: false
, the HTMLProofer now completes the check. 🎉 Thanks! 🙏🏼
@gjtorikian, I investigated a possible approach to minimize the time-consuming create_nokigiri
calls and have a possible working solution in mind: Need to clean it up a bit but will try to draft a PR soon.
See the open PR #770 for the proposed approach, including a link to output generated locally with the proposed solution, showing the effectiveness of the approach on the build-site
artifacts from @asbjornu
Thanks to @riccardoporreca, this has now been optimized in 4.4.1.
Thank you so much @riccardoporreca! 🙏🏼 With check_internal_hash: true
, HTMLProofer now completes in 24 minutes instead of infinity. So definitely an improvement! 👏🏼 But still a far reach from the 4 minutes it takes with check_internal_hash: false
. 🤔
After upgrading to HTMLProofer v4, it has stopped working for our Jekyll-built site. As can be seen in the following build, the build step is canceled before completion, after having run for 6 hours. It halts after the following lines have been logged:
I would have run HTMLProofer on the command line if
before_request
was available there, but as that requires HTMLProofer to be run from Ruby, I've written the following little wrapper class (abbreviated here for brevity):If you can spot anything erroneous about the above code, I would highly appreciate any pointers. To reproduce the problem, clone developer.swedbankpay.com and then run:
If there's anything I can do to help debug this issue, please let me know!