a11ywatch / github-actions

A11yWatch Github Action
https://github.com/marketplace/actions/web-accessibility-evaluation
MIT License
22 stars 4 forks source link

Trying with Jekyll localhost and I am only seeing 2 pages being crawled #43

Closed dmundra closed 1 year ago

dmundra commented 1 year ago

Follow-up from https://github.com/CivicActions/accessibility/issues/687#issuecomment-1555339772

Here is my .github workflow in the PR https://github.com/CivicActions/accessibility/pull/697:

name: a11ywatch

on: [pull_request]

jobs:
  build:
    name: Building site and running a11ywatch
    runs-on: ubuntu-latest

    steps:
      - name: Checkout source.
        uses: actions/checkout@v2

      - name: Install jekyll site dependencies.
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: 2.6
          bundler-cache: true

      - name: Install npm dependencies.
        run: npm ci

      - name: Start up jekyll server.
        run: bundle exec jekyll serve --detach -c _config.yml,_config_local.yml

      - uses: a11ywatch/github-action@v1.14.0
        with:
          WEBSITE_URL: http://localhost:4000
          SITE_WIDE: true
          SITEMAP: true
          LIST: true

I fixed the sitemap issue mentioned in the other thread. When the action runs it only crawls 2 pages. Here is the output from https://github.com/CivicActions/accessibility/actions/runs/5050879141/jobs/9062115155?pr=697

Run a11ywatch crawl --url http://localhost:4000   --sitemap --save 

{"code":0,"data":[{"domain":"localhost","issues":[],"issuesInfo":{"accessScore":0,"errorCount":0,"issueMeta":{"skipContentIncluded":true},"noticeCount":0,"totalIssues":0,"warningCount":0},"lastScanDate":"2023-05-22T22:14:29.108Z","online":true,"pageLoadTime":{"duration":0,"durationFormated":"Cached/Extremely Fast"},"url":"http://localhost:4000"},{"domain":"localhost","issues":[],"issuesInfo":{"accessScore":0,"errorCount":0,"issueMeta":{"skipContentIncluded":true},"noticeCount":0,"totalIssues":0,"warningCount":0},"lastScanDate":"2023-05-22T22:14:29.122Z","online":true,"pageLoadTime":{"duration":0,"durationFormated":"Cached/Extremely Fast"},"url":"http://localhost:4000"}],"message":"Crawled 2 pages in 360.327889ms","success":true}

Is the port the issue? Is there another configuration option I should try?

j-mendez commented 1 year ago

@dmundra yes ports are treated as different domains with the crawler. For the reporting to github I forgot to mention it only sends reports when issues are found to avoid noise. It might make sense to add a config for this https://github.com/a11ywatch/github-actions/blob/main/action.yml#L247.

j-mendez commented 1 year ago

The crawler may treat the ports when using the TLD=true config for including all localhost pages. It should work, if not can take a look later on the bug.

dmundra commented 1 year ago

I tried TLD: true and the process seems to be stuck https://github.com/CivicActions/accessibility/actions/runs/5051093166/jobs/9062553983?pr=697.

j-mendez commented 1 year ago

@dmundra appreciate the info, I can checkout the project and debug the issue tomorrow. Sorry for issues!

j-mendez commented 1 year ago

hello @dmundra, I did some testing this morning and was able to see that the issue lies within the browser settings for chrome in the a11ywatch standalone container used in the action.

First I wanted to see if the crawler had problems and it went through the page fine.

"http://127.0.0.1:4000/playbook/community"
"http://127.0.0.1:4000/playbook/pwd"
"http://127.0.0.1:4000/posts/website-analytics"
"http://127.0.0.1:4000/posts/at-banter-podcast"
"http://127.0.0.1:4000/posts/gcn-accessibility-compliance-as-code"
"http://127.0.0.1:4000/guide/design"
"http://127.0.0.1:4000/colophon"
"http://127.0.0.1:4000/license"
Time elapsed in website.crawl() is: 62.338917ms for total pages: 105.

After I started the a11ywatch container locale the one used with the action and saw this message appear about js navigating to a different page.

Application started in SUPER mode. All restrictions removed.
Server ready at localhost:3280
GraphQL server ready at localhost:3280/graphql
Subscriptions ready at ws://localhost:3280/graphql
gRPC server running at 0.0.0.0:50053
gRPC server running at http://127.0.0.1:50052
gRPC server running at 0.0.0.0:50051
public - gRPC server running at 0.0.0.0:50050
gRPC clients connected - pagemind, crawler, and mav.
chrome launched and connected on: ws://127.0.0.1:38757/9bc702a384e5849340d59f1a5d43e329
page.evaluate: Execution context was destroyed, most likely because of a navigation
at u (/usr/src/app/node_modules/kayle/build/kayle.js:1:476)
at i (/usr/src/app/node_modules/kayle/build/kayle.js:1:294)
at g (/usr/src/app/node_modules/kayle/build/kayle.js:1:1767) {
name: 'Error'
}
page.evaluate: Execution context was destroyed, most likely because of a navigation
at u (/usr/src/app/node_modules/kayle/build/kayle.js:1:476)
at i (/usr/src/app/node_modules/kayle/build/kayle.js:1:294)
at g (/usr/src/app/node_modules/kayle/build/kayle.js:1:1767) {
name: 'Error'
}
.

We use internally a config that prevents navigation with scripts, in this case we may want to proxy the navigation to the parent as another page to continue crawling. Going to fix this on the chrome side for now to prevent the context from being destroyed.

j-mendez commented 1 year ago

hello @dmundra, I did some testing this morning and was able to see that the issue lies within the browser settings for chrome in the a11ywatch standalone container used in the action.

First I wanted to see if the crawler had problems and it went through the page fine.

"http://127.0.0.1:4000/playbook/community" "http://127.0.0.1:4000/playbook/pwd" "http://127.0.0.1:4000/posts/website-analytics" "http://127.0.0.1:4000/posts/at-banter-podcast" "http://127.0.0.1:4000/posts/gcn-accessibility-compliance-as-code" "http://127.0.0.1:4000/guide/design" "http://127.0.0.1:4000/colophon" "http://127.0.0.1:4000/license" Time elapsed in website.crawl() is: 62.338917ms for total pages: 105.

After I started the a11ywatch container locale the one used with the action and saw this message appear about js navigating to a different page.

Application started in SUPER mode. All restrictions removed. Server ready at localhost:3280 GraphQL server ready at localhost:3280/graphql Subscriptions ready at ws://localhost:3280/graphql gRPC server running at 0.0.0.0:50053 gRPC server running at http://127.0.0.1:50052 gRPC server running at 0.0.0.0:50051 public - gRPC server running at 0.0.0.0:50050 gRPC clients connected - pagemind, crawler, and mav. chrome launched and connected on: ws://127.0.0.1:38757/9bc702a384e5849340d59f1a5d43e329 page.evaluate: Execution context was destroyed, most likely because of a navigation at u (/usr/src/app/node_modules/kayle/build/kayle.js:1:476) at i (/usr/src/app/node_modules/kayle/build/kayle.js:1:294) at g (/usr/src/app/node_modules/kayle/build/kayle.js:1:1767) { name: 'Error' } page.evaluate: Execution context was destroyed, most likely because of a navigation at u (/usr/src/app/node_modules/kayle/build/kayle.js:1:476) at i (/usr/src/app/node_modules/kayle/build/kayle.js:1:294) at g (/usr/src/app/node_modules/kayle/build/kayle.js:1:1767) { name: 'Error' } .

We use internally a config that prevents navigation with scripts, in this case we may want to proxy the navigation to the parent as another page to continue crawling. Going to fix this on the chrome side for now to prevent the context from being destroyed.

--

It looks like the crawler used for a11ywatch is stopping on the first page, since it should still process all the pages regardless of the page.evaluate error. Taking a look now.

j-mendez commented 1 year ago

@dmundra The issue is due to the docker container using localhost and it targeting the container directly. There used to be a note to use something like this a11ywatch crawl --url http://host.docker.internal:4000 -d -n - hostname would resolve to the machine outside docker. In this case we need to replace localhost with host.docker.internal to connect locally. Going to keep this issue up, since we have a bare metal install of the system that we can use instead, which performs a lot better since it does not need to go through the docker layer. The reason it is not default is the setup for caching in that layer needs to be done.

Crawl partial results: on:,"message":"Crawled 105 pages in 9.515014917s","success":true}.

dmundra commented 1 year ago

Thanks @j-mendez for diving into this. Those make sense to me. I am looking forward to trying the bare metal version.

j-mendez commented 1 year ago

Thanks @j-mendez for diving into this. Those make sense to me. I am looking forward to trying the bare metal version.

Np 🙌, localhost testing now available using action @v2. Here is a test PR run for the bare metal installations https://github.com/a11ywatch/github-actions/actions/runs/5061355558/jobs/9085452770.

dmundra commented 1 year ago

I tried v2 and it looks like the process timed out https://github.com/CivicActions/accessibility/actions/runs/5062293525/jobs/9087629055?pr=697. No error was provided so not sure what got stuck.

j-mendez commented 1 year ago

@dmundra taking a look now, appreciate the patience on this.

j-mendez commented 1 year ago

@dmundra the issue is due to the TLD: true combination with localhost. Removing the config allows the action to pass here https://github.com/j-mendez/accessibility/actions/runs/5070404507/jobs/9105390504?pr=1. Going to add the issue to the crawler.

dmundra commented 1 year ago

Ah good point. Giving it a try now.

j-mendez commented 1 year ago

Ah good point. Giving it a try now.

I noticed the pa11y action has a bit more urls, I am not sure if it is due to the sitemap swap being done.

Results of the action locally since the action currently only sends results when errors occur. Do you think it is worth adding an option to always report the results? Amazing access on the website - took a peek at some of the content manually 🙂 .

(base) ➜  accessibility git:(a11ywatch-addition) a11ywatch --results-parsed-list               
Ran A11yWatch on 106 URLs:

 > http://localhost:4000/colophon - 0 errors
 > http://localhost:4000/accessibility - 0 errors
 > http://localhost:4000/about/people/jennifer-aube - 0 errors
 > http://localhost:4000/posts/helping-disabled-veterans-check-in-designing-an-accessibility-map - 0 errors
 > http://localhost:4000/guide/design - 0 errors
 > http://localhost:4000/guide/identity-language - 0 errors
 > http://localhost:4000/guide/resources - 0 errors
 > http://localhost:4000/about/people/mike-gifford - 0 errors
 > http://localhost:4000/guide/training - 0 errors
 > http://localhost:4000/guide/introduction - 0 errors
 > http://localhost:4000/posts/daniel-mundra-diving-into-drupal - 0 errors
 > http://localhost:4000/playbook/community - 0 errors
 > http://localhost:4000/projects/uswds-color - 0 errors
 > http://localhost:4000/guide/semantic-html - 0 errors
 > http://localhost:4000/calendar - 0 errors
 > http://localhost:4000/playbook/follow-global-initiatives - 0 errors
 > http://localhost:4000/about/people/ - 0 errors
 > http://localhost:4000/about/people/daniel-mundra - 0 errors
 > http://localhost:4000/playbook/personalization - 0 errors
 > http://localhost:4000/open - 0 errors
 > http://localhost:4000/guide/champions-program - 0 errors
 > http://localhost:4000/guide/ - 0 errors
 > http://localhost:4000/about/people/jack-haas - 0 errors
 > http://localhost:4000/projects/drupal - 0 errors
 > http://localhost:4000/guide/organizations - 0 errors
 > http://localhost:4000/playbook/checklists - 0 errors
 > http://localhost:4000/guide/documents - 0 errors
 > http://localhost:4000 - 0 errors
 > http://localhost:4000/guide/social-media - 0 errors
 > http://localhost:4000/playbook/training - 0 errors
 > http://localhost:4000/contact - 0 errors
 > http://localhost:4000/join - 0 errors
 > http://localhost:4000/guide - 0 errors
 > http://localhost:4000/playbook - 0 errors
 > http://localhost:4000/analytics - 0 errors
 > http://localhost:4000/license - 0 errors
 > http://localhost:4000/guide/plain-language - 0 errors
 > http://localhost:4000/playbook/roles - 0 errors
 > http://localhost:4000 - 0 errors
 > http://localhost:4000/news/ - 0 errors
 > http://localhost:4000/projects/uswds - 0 errors
 > http://localhost:4000/about/ - 0 errors
 > http://localhost:4000/playbook/pdf - 0 errors
 > http://localhost:4000/about/contact - 0 errors
 > http://localhost:4000/guide/events - 0 errors
 > http://localhost:4000/playbook/distributed-teams - 0 errors
 > http://localhost:4000/playbook/AT - 0 errors
 > http://localhost:4000/playbook/authoring - 0 errors
 > http://localhost:4000/playbook/automated-testing - 0 errors
 > http://localhost:4000/guide/onboarding-staff - 0 errors
 > http://localhost:4000/posts/how-drupal-helps-us-bake-accessibility-into-every-project - 0 errors
 > http://localhost:4000/posts/opensource-automated-accessibility-testing - 0 errors
 > http://localhost:4000/about/people/allison-carroll - 0 errors
 > http://localhost:4000/about/people/luke-fretwell - 0 errors
 > http://localhost:4000/posts/launching-a-community-of-practice-for-accessibility-in-government-services - 0 errors
 > http://localhost:4000/posts/Talking-Drupal-Podcast - 0 errors
 > http://localhost:4000/posts/website-analytics - 0 errors
 > http://localhost:4000/posts/Talking-Drupal-Podcast-382 - 0 errors
 > http://localhost:4000/posts/governments-accessibility-opensource - 0 errors
 > http://localhost:4000/heart - 0 errors
 > http://localhost:4000/posts/govtech-how-will-biden-transform-government-website-accessibility - 0 errors
 > http://localhost:4000/posts/mvp-playbook - 0 errors
 > http://localhost:4000/about/people/michelle-kang - 0 errors
 > http://localhost:4000/posts/heart-accessibility - 0 errors
 > http://localhost:4000/posts/social-media-accessibility-guide - 0 errors
 > http://localhost:4000/posts/hello-world - 0 errors
 > http://localhost:4000/posts/automated-accessibility-testing-leveraging-github-actions-and-pa11y-ci-with-axe - 0 errors
 > http://localhost:4000/about/people/civicactions - 0 errors
 > http://localhost:4000/posts/CivicActions-Accessibility-Pledge - 0 errors
 > http://localhost:4000/posts/scanning-over-two-million-gov-pages - 0 errors
 > http://localhost:4000/posts/CivicActions-Creates-Open-Product-Accessibility-Template - 0 errors
 > http://localhost:4000/posts/at-banter-podcast - 0 errors
 > http://localhost:4000/posts/government-accessibility-and-the-cms-problem - 0 errors
 > http://localhost:4000/posts/plain-language-accessibility-guide - 0 errors
 > http://localhost:4000/posts/gcn-accessibility-compliance-as-code - 0 errors
 > http://localhost:4000/posts/smashingmag-baking-in-accessibility-testing - 0 errors
 > http://localhost:4000/posts/qa-with-mike-gifford-accessibility-in-civic-tech - 0 errors
 > http://localhost:4000/posts/improve-government-accessibility-through-open-source - 0 errors
 > http://localhost:4000/posts/mvp-guide - 0 errors
 > http://localhost:4000/projects/ - 0 errors
 > http://localhost:4000/posts/gsa-machine-readable-acr - 0 errors
 > http://localhost:4000/guide/history - 0 errors
 > http://localhost:4000/posts/pre-GAAD-Authoring-Tools-Built-in-Accessibility - 0 errors
 > http://localhost:4000/playbook/documents - 0 errors
 > http://localhost:4000/posts/how-we-scale-inclusive-website-content-with-automated-testing-and-open-source-tools - 0 errors
 > http://localhost:4000/posts/FOSDEM-ccessibility-OpenSource - 0 errors
 > http://localhost:4000/about/join - 0 errors
 > http://localhost:4000/posts/4-ways-to-improve-government-accessibility-through-open-source - 0 errors
 > http://localhost:4000/playbook/ - 0 errors
 > http://localhost:4000/about/people/nira-datta - 0 errors
 > http://localhost:4000/playbook/ai-and-ia - 0 errors
 > http://localhost:4000/about/people - 0 errors
 > http://localhost:4000/posts/ - 0 errors
 > http://localhost:4000/playbook/manual-testing - 0 errors
 > http://localhost:4000/playbook/practice - 0 errors
 > http://localhost:4000/guide/glossary - 0 errors
 > http://localhost:4000/about/people/jonathan-bourland - 0 errors
 > http://localhost:4000/playbook/statements - 0 errors
 > http://localhost:4000/projects - 0 errors
 > http://localhost:4000/guide/tools - 0 errors
 > http://localhost:4000/conduct - 0 errors
 > http://localhost:4000/playbook/pwd - 0 errors
 > http://localhost:4000/about/people/vanessa-luxen - 0 errors
 > http://localhost:4000/people - 0 errors
 > http://localhost:4000/news - 0 errors
 > http://localhost:4000/search - 0 errors

✔ 106/106 URLs passed
dmundra commented 1 year ago

I noticed the pa11y action has a bit more urls, I am not sure if it is due to the sitemap swap being done.

Ya the sitemaps includes URLs that are not linked anywhere in the content.

Results of the action locally since the action currently only sends results when errors occur. Do you think it is worth adding an option to always report the results?

Ya, I think it would be useful as an option.

Amazing access on the website - took a peek at some of the content manually 🙂 .

Thank you! We try to keep up on it.

Looks like it ran successfully https://github.com/CivicActions/accessibility/actions/runs/5070580492/jobs/9105811040?pr=697 and no errors, nice.

dmundra commented 1 year ago

I do like that a11ywatch doesn't need a sitemap and can find a lot of links. A lot of sites don't have sitemaps so I see a11ywatch working better for those sites.

j-mendez commented 1 year ago

I do like that a11ywatch doesn't need a sitemap and can find a lot of links. A lot of sites don't have sitemaps so I see a11ywatch working better for those sites.

That is a good point, if there is a feature that would really help feel free to post on any the repos. Depending on what it is some features that are small may be able to get in there easy especially if it can help make an impact 🙂. This ticket helped a lot since it unblocked localhost testing.

dmundra commented 1 year ago

Glad to hear it. I will close this ticket as fixed!