Huge difference in number of violations compared to Pa11y and Total Validator

dervism commented 5 years ago

This might be just a question, not necessarily a bug. I tested Total Validator and Pa11y and compared their output with axe-core. I ran a test against a Norwegian news website: http://www.vg.no

Total Validator: 466 violations (183 color-contrast errors)
Pa11y: 382 violations (172 color-contrast errors)
aXe: 282 violations (134 color-contrast errors)

Why is there such a big difference in the number of detected violations between these tools? According to the AlphaGov audit, HTML CodeSniffer (used by Pa11y), is finding a lower amount of errors compared to axe. But this tests shows the opposite.

Could you please help me understand the difference in the numbers? See output from each tool below.


axe-core version: 3.1.2
axe-webdriver version: 2.1.0
selenium-webdriver version: 4.0.0-alpha.1

For Tooling issues:
- Node version: 10.10 
- Platform:  Mac

Total Validator:

Total errors found: 466
Parsing [2]:    E007 [2]
WCAG2 A [270]:  E860 [10], E863 [68], E866 [10], E878 [9], E885 [2], E894 [15], E904 [1], E908 [1], P871 [152], P883 [2]
WCAG2 AA [11]:  E910 [8], E913 [3]
WCAG2 AAA [183]:    P924 [183]

Total warnings found:   1121
Parsing [310]:  W001 [310]
WCAG2 A [811]:  W860 [292], W861 [4], W867 [508], W873 [1], W874 [1], W878 [1], W882 [1], W883 [1], W886 [2]

axe

Report: 
--------------------------------------------------------------------------------
There are 96 violations on URL: https://www.vg.no/
    - 1 instance of the following violation type: aria-valid-attr-value
    - 2 instances of the following violation type: color-contrast
    - 1 instance of the following violation type: duplicate-id
    - 3 instances of the following violation type: frame-title
    - 1 instance of the following violation type: heading-order
    - 1 instance of the following violation type: html-has-lang
    - 5 instances of the following violation type: image-alt
    - 1 instance of the following violation type: landmark-no-duplicate-contentinfo
    - 78 instances of the following violation type: link-name
    - 2 instances of the following violation type: page-has-heading-one
    - 1 instance of the following violation type: region

End of violations on: https://www.vg.no/
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
There are 186 warnings on URL: https://www.vg.no/
    - 132 instances of the following warning type: color-contrast
    - 2 instances of the following warning type: frame-tested
    - 50 instances of the following warning type: hidden-content
    - 2 instances of the following warning type: css-orientation-lock

End of warnings on: https://www.vg.no/
--------------------------------------------------------------------------------

Pa11y

www.vg.no: 382 errors, 130 warnings, 962 notices

marcysutton commented 5 years ago

Those three tools have completely different rulesets with different priorities. axe-core focuses on accessibility support and no false positives; I can't speak to the priorities for the others. It is expected that there would be differences, as they are developed by totally different teams.

To read more about how we make decisions on accessibility support, here's a blog post: https://www.deque.com/blog/weve-got-your-back-with-accessibility-supported-in-axe/

WilcoFiers commented 5 years ago

@dervism The simplest answer to your question of: "Why do tools produce different results", is that there is no standard for accessibility tools. All tools behave different in different situations.

One thing I can tell you is that, I don't know how you produced that report, but it doesn't have a correct implementation of axe-core. The frames-tested rule will only return a warning when axe-core isn't properly executing. This means none of the content in those two frames was tested for accessibility. If you're interested, you can find a list of axe-core implementations here: https://github.com/dequelabs/axe-core/blob/develop/doc/projects.md

Lastly, there is work under way to actually standardise how automated accessibility tools work. You can find information about this here: https://www.w3.org/community/auto-wcag/

dequelabs / axe-core

Huge difference in number of violations compared to Pa11y and Total Validator #1243