HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
622 stars 182 forks source link

Accessibility 2021 #2147

Closed rviscomi closed 3 years ago

rviscomi commented 3 years ago

Part II Chapter 9: Accessibility

Accessibility illustration

If you're interested in contributing to the Accessibility chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@alextait1 @alextait1 @scottdavis99 @oluoluoxenfree @ericwbailey @clottman @shantsis @digitala11ies @obto - @obto -
Expand for more information about each role - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

Chapter resources

Refer to these 2021 Accessibility resources throughout the content creation process:

📄 Google Docs for outlining and drafting content 🔍 SQL files for committing the queries used during analysis 📊 Google Sheets for saving the results of queries 📝 Markdown file for publishing content and managing public metadata

foxdavidj commented 3 years ago

@scottdavis99 @ericwbailey @oluoluoxenfree @alextait1 @schachin Moved scheduling this chat to an email I just sent. Faster to go back and forth with times there.

@alextait1 can you edit the top comment and put @schachin in the proper role?

@schachin Can you shoot an email to david@davidjfox.com? So i can add you to the email thread i just sent.

schachin commented 3 years ago

Sorry I missed this -- will do what you requested now :)


More Info & Publications

Contact Info

Helping You Make It Better by Making It Work. Recommendations and References available on LinkedIn or by request. Client information is generally protected by NDA and not typically available on public sites.

On Wednesday, May 19, 2021, 6:12:36 AM PDT, David Fox ***@***.***> wrote:  

@scottdavis99 @ericwbailey @oluoluoxenfree @alextait1 @schachin Moved scheduling this chat to an email I just sent. Faster to go back and forth with times there.

@alextait1 can you edit the top comment and put @schachin in the proper role?

@schachin Can you shoot an email to @.***? So i can add you to the email thread i just sent.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

schachin commented 3 years ago

Hi! Question. (not sure if you are the right one to send this to) Is there a way to also be a reviewer on the SEO Section -- there is so much bad info out there right now my fear is it will get repeated in the SEO section.

To note... I have 16 years SEO exp 20 years web (design and dev) Written over 125 articles on Search and GoogleSpoken at over 80 conferences in the US and Internationally including SXSWi Our industry just suffers from a lot of people competing to be rockstars right now and so they put out info that is not well researched and I just want to review to make sure we do not hurt site owners with that making it into this.

Because in the end SEO done wrong can put a company out of business.

Thank you! Kristine


More Info & Publications

Contact Info

Helping You Make It Better by Making It Work. Recommendations and References available on LinkedIn or by request. Client information is generally protected by NDA and not typically available on public sites.

On Wednesday, May 12, 2021, 2:47:05 PM PDT, Rick Viscomi ***@***.***> wrote:  

@schachin oh I think if you're only reading the email thread you don't have the full context. Visit #2147 (comment) to see the whole GitHub issue; the top comment explains how to contribute in more detail. And here's the post that kicks off the project. For context I got your info from the 2021 Web Almanac interest form where you indicated that you're a SME in accessibility and open to authoring or reviewing. You also mentioned SEO, but that chapter has more than enough contributors at the moment!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

foxdavidj commented 3 years ago

@schachin That'd be a question for the SEO team lead Patrick Stox. Here's their GitHub tracking issue where you can ask: https://github.com/HTTPArchive/almanac.httparchive.org/issues/2146

Looks like they may be full with reviewers (13 signed up already). But can always reach out and ask 😃

schachin commented 3 years ago

Oh cool I know Patrick so will ask him directly at least I feel better knowing he is running that.


More Info & Publications

Contact Info

Helping You Make It Better by Making It Work. Recommendations and References available on LinkedIn or by request. Client information is generally protected by NDA and not typically available on public sites.

On Sunday, May 23, 2021, 11:19:58 AM PDT, David Fox ***@***.***> wrote:  

@schachin That'd be a question for the SEO team lead Patrick Stox. Here's their GitHub tracking issue where you can ask: #2146

Looks like they may be full with reviewers (13 signed up already). But can always reach out and ask 😃

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

foxdavidj commented 3 years ago

@alextait1 @scottdavis99 @oluoluoxenfree @ericwbailey @clottman @shantsis @digitala11ies

Hey everyone, looks like there's been a lot of progress on creating the chapter outline which is awesome!

We should be in great shape to complete it by June 15 so we have enough time to update our crawler with any additional metrics you need this year.

Also a reminder that the team has a channel on slack (#web-almanac-a11y), so feel free to join the discussion there as well: https://join.slack.com/t/httparchive/shared_invite/zt-45sgwmnb-eDEatOhqssqNAKxxOSLAaA

If you have any other questions don't hesitate to reach out :)

rockeynebhwani commented 3 years ago

Hi @alextait1 and team,

Just wanted to let you know what we had added detection for various accessibility overlay type solutions Wappalyzer last year (https://github.com/AliasIO/wappalyzer/issues/3228). If you know of any more and want reported, you should extend Wappalyzer detection logic.

Cheers, Rockey

alextait1 commented 3 years ago

@rockeynebhwani thanks so much we do plan to talk about overlays so I'll check it out!

alextait1 commented 3 years ago

@alextait1 @scottdavis99 @oluoluoxenfree @ericwbailey @clottman @shantsis @digitala11ies the high level overview is up and ready for review and additions from authors. I'm hoping to start going a level deeper towards the metrics this coming week so take a peak if you can, and authors especially want to be sure you've weighed in 🙏🏼

rockeynebhwani commented 3 years ago

@rockeynebhwani thanks so much we do plan to talk about overlays so I'll check it out!

@alextait1 - FYI.. I briefly covered this in eCommerce chapter last year - https://almanac.httparchive.org/en/2020/ecommerce#accessibility-solutions

alextait1 commented 3 years ago

@rockeynebhwani thanks so much we do plan to talk about overlays so I'll check it out!

@alextait1 - FYI.. I briefly covered this in eCommerce chapter last year - https://almanac.httparchive.org/en/2020/ecommerce#accessibility-solutions

Yes we touched on it as well but in a more qualitative way, not with metrics. We're going to do a deeper dive this year for sure. Thank you for the support!

alextait1 commented 3 years ago

@scottdavis99 @oluoluoxenfree @ericwbailey @clottman @shantsis @digitala11ies @obto

Thanks for the comments on the doc, I've made some updates. Authors - I hope you've had a chance to take a look, didn't see any comments from you so hopefully no news is good news 😂

And with that milestone 0 and 1 are checked off!

David - if the metrics requests need clarifying let us know

tunetheweb commented 3 years ago

I was reminded again of the fact that sites which use ARIA often have more accessibility issues than sites that don't. I've never particularly liked that stat as to correlation does not apply causation. If you are a really complex site and use a Bootstrap component with one ARIA label (even if valid use of that), but lots of other errors due to the fact it's a complex site, then suddenly you're dragging down the stats - but more because you're a complex site, than because of use of ARIA.

I think this originally came from the WebAIM survey which does clarify it more:

ARIA correlated to higher detectable errors. The more ARIA attributes that were present, the more detectable accessibility errors could be expected. This does not necessarily mean that ARIA introduced these errors (these pages are more complex), but pages typically had more errors when ARIA was present.

The second part of that tends to be forgotten (it doesn't help that the first part is in bold!). While "no ARIA is better than bad ARIA", I wonder should we dig into it more?

Should we consider looking at slicing and dicing how different sites/technologies are accessible? I'm thinking we could use the Lighthouse Accessibility score and measure how this changes depending on:

I know Lighthouse isn't perfect and 100 Accessibility scores doesn't imply a perfectly accessible site, but < 100 strongly implies a less than accessible site and as a simple measure of "how accessible a website is" I think it's a decent enough metric to report on the above to give a high level summary.

I don't think we'd need any new custom metrics for this as we have all of the above info. Just a matter of writing some more queries based on the available data (you're welcome @obto! 😁) . And if the data show nothing of interest and no correlation then we just drop it.

Interested to hear your thoughts on whether this is worthwhile and if you have any other stats to add to above?

ericwbailey commented 3 years ago

While I am personally super curious about some of your ideas about how to peer into this data further, I'm unsure if clarification would do what we intend it to. I would hate to inadvertently communicate that it's acceptable to use ARIA not as a last resort and without testing.

I oftentimes find that presenting this kind of info gives someone who has already made up their mind, or who has already committed code a way to back up their decision.

tunetheweb commented 3 years ago

I'm not talking about using this to justify the use of ARIA (though I admit it was that quote that started off my thinking here), but more drilling into what types of pages/sites are more or less accessible. ARIA use is just one measure of that (and a bad one IMHO for the reasons I gave above).

Actually I see some of the ideas have been covered by WebAIM Million report: https://webaim.org/projects/million/#technologies

Would definitely be interested in seeing different rates of accessibility in top 1000, 10k, 100k, million and all websites.

And the great thing about the Web Almanac is we don't just present the stats but have experts giving their interpretation of what that means. So can hopefully at least somewhat address your concerns with that.

tunetheweb commented 3 years ago

OK curiosity got the better of me so I ran some stats:

category percentile all_sites uses_aria accessibe top1k tok10k top100k tok1m
num_sites   7,150,239 5,134,088 417 863 7,768 79,014 782,451
num_sites_pct   100% 72% 0.006% 0.012% 0.109% 1.105% 10.943%
percentile 10 0.61 0.7 0.7 0.6 0.6 0.61 0.61
percentile 25 0.73 0.79 0.77 0.72 0.71 0.72 0.73
percentile 50 0.83 0.86 0.84 0.82 0.81 0.82 0.83
percentile 75 0.91 0.92 0.91 0.91 0.9 0.9 0.9
percentile 90 0.96 0.96 0.96 0.97 0.96 0.96 0.96
percentile 95 0.97 0.98 0.97 0.98 0.98 0.98 0.98
percentile 99 1 1 0.98 1 1 1 1

Here's some takeaways I see from this:

Of course it should be remembered that Lighthouse Accessibility checks are limited and a high score does not indicate a site is accessible – though I do usually find the opposite is true (i.e. a low score indicates a site is usually at least partially inaccessible), so think there is still value in looking at this as a broad indicator of how accessible/inaccessible a website is when dealing at the scale we deal with.

Anyway, satisfied my own curiosity and so will bow out again now and leave the chapter team to decide if they want to include any of this type of info in the chapter.

SQL Query below. It uses 3TB at a cost of $15 and takes a good 15 mins to run! - I'm sure it can be improved but just something I knocked together to see if this was worth exploring further.

#standardSQL
CREATE TEMPORARY FUNCTION usesAriaAttributes(payload STRING)
RETURNS BOOL LANGUAGE js AS '''
try {
  const almanac = JSON.parse(payload);
  const containsAria = (element) => element.includes('aria') === true;
  return Object.keys(almanac.attributes_used_on_elements).some(containsAria)
} catch (e) {
  return false
}
''';

WITH lighthouse_scores AS (
  SELECT url, 
    CAST(JSON_EXTRACT(report, '$.categories.accessibility.score') AS NUMERIC) AS accessibility
  FROM
    #`httparchive.sample_data.lighthouse_mobile_10k`
    `httparchive.lighthouse.2021_05_01_mobile`
  WHERE JSON_EXTRACT(report, '$.categories.accessibility.score') IS NOT NULL
),
all_sites AS (
  SELECT
    COUNT(0) AS all_sites_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS all_sites_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
uses_aria AS (
  SELECT
    COUNT(0) AS uses_aria_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS uses_aria_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      #`httparchive.sample_data.pages_mobile_10k`
      `httparchive.pages.2021_05_01_mobile`
    USING (url)
    WHERE
      usesAriaAttributes(JSON_EXTRACT_SCALAR(payload, '$._almanac'))
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
uses_accessibe AS (
  SELECT
    COUNT(0) AS uses_accessibe_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS uses_accessibe_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      #`httparchive.sample_data.technologies_mobile_10k`
      `httparchive.technologies.2021_05_01_mobile`
    USING (url)
    WHERE
      category = 'Accessibility' AND
      APP = 'AccessiBe'
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
ranking AS (
  SELECT DISTINCT
    origin || '/' AS url,
    experimental.popularity.rank AS rank
  FROM
    `chrome-ux-report.all.202105`
),
top1k AS (
  SELECT
    COUNT(0) AS top1k_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top1k_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      ranking
    USING (url)
    WHERE
      rank <= 1000
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
top10k AS (
  SELECT
    COUNT(0) AS top10k_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top10k_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      ranking
    USING (url)
    WHERE
      rank > 1000 AND
      rank <= 10000
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
top100k AS (
  SELECT
    COUNT(0) AS top100k_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top100k_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      ranking
    USING (url)
    WHERE
      rank > 10000 AND
      rank <= 100000
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
top1m AS (
  SELECT
    COUNT(0) AS top1m_num_sites,
    percentile,
    APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top1m_score
  FROM (
    SELECT
      accessibility
    FROM
      lighthouse_scores
    JOIN
      ranking
    USING (url)
    WHERE
      rank > 100000 AND
      rank <= 1000000
    ),
    UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile
  GROUP BY
    percentile
),
results AS (
SELECT
  all_sites_num_sites,
  uses_aria_num_sites,
  uses_accessibe_num_sites,
  top1k_num_sites,
  top10k_num_sites,
  top100k_num_sites,
  top1m_num_sites,
  percentile,
  all_sites_score,
  uses_aria_score,
  uses_accessibe_score,
  top1k_score,
  top10k_score,
  top100k_score,
  top1m_score
FROM
  all_sites
JOIN
  uses_aria
USING
  (percentile)
JOIN
  uses_accessibe
USING
  (percentile)
JOIN
  top1k
USING
  (percentile)
JOIN
  top10k
USING
  (percentile)
JOIN
  top100k
USING
  (percentile)
JOIN
  top1m
USING
  (percentile)
)

SELECT
  'num_sites' as category,
  NULL as percentile,
  MAX(all_sites_num_sites) AS all_sites,
  MAX(uses_aria_num_sites) AS uses_aria,
  MAX(uses_accessibe_num_sites) AS accessibe,
  MAX(top1k_num_sites) AS top1k,
  MAX(top10k_num_sites) AS tok10k,
  MAX(top100k_num_sites) AS top100k,
  MAX(top1m_num_sites) AS tok1m
FROM
  results
UNION ALL
SELECT
  'num_sites_pct' as category,
  NULL as percentile,
  MAX(all_sites_num_sites)/MAX(all_sites_num_sites),
  MAX(uses_aria_num_sites)/MAX(all_sites_num_sites),
  MAX(uses_accessibe_num_sites)/MAX(all_sites_num_sites),
  MAX(top1k_num_sites)/MAX(all_sites_num_sites),
  MAX(top10k_num_sites)/MAX(all_sites_num_sites),
  MAX(top100k_num_sites)/MAX(all_sites_num_sites),
  MAX(top1m_num_sites)/MAX(all_sites_num_sites)
FROM
  results
UNION ALL
SELECT
  'percentile' as category,
  percentile,
  all_sites_score,
  uses_aria_score,
  uses_accessibe_score,
  top1k_score,
  top10k_score,
  top100k_score,
  top1m_score
FROM
  results
ORDER BY
  category,
  percentile
digitala11ies commented 3 years ago

Thanks for taking the time to do all that, Barry!

I'm definitely mega hesitant to associate ARIA usage being good vs bad. While only an anecdotal experience, I've faced a lot of people using properties like role="presentation", role="application" and aria-hidden="true" in really damaging ways but which was only discovered in manual testing, which makes me hesitant to comment on ARIA this way.

On Sat, Jun 19, 2021, 10:11 AM Barry Pollard @.***> wrote:

OK curiosity got the better of me so I ran some stats: category percentile all_sites uses_aria accessibe top1k tok10k top100k tok1m num_sites 7,150,239 5,134,088 417 863 7,768 79,014 782,451 num_sites_pct 100% 72% 0.006% 0.012% 0.109% 1.105% 10.943% percentile 10 0.61 0.7 0.7 0.6 0.6 0.61 0.61 percentile 25 0.73 0.79 0.77 0.72 0.71 0.72 0.73 percentile 50 0.83 0.86 0.84 0.82 0.81 0.82 0.83 percentile 75 0.91 0.92 0.91 0.91 0.9 0.9 0.9 percentile 90 0.96 0.96 0.96 0.97 0.96 0.96 0.96 percentile 95 0.97 0.98 0.97 0.98 0.98 0.98 0.98 percentile 99 1 1 0.98 1 1 1 1

Here's some takeaways I see from this:

  • I don't see the same findings as WebAIM that accessibility is worse when ARIA is used – they tend to be slightly more accessible as far as the Lighthouse score measures this. And it IS used a lot - by 72% of our sites! This may be due to the limited audits that Lighthouse, and the underlying axe library, performs (I tend to find they tend to only use only audits that are less susceptible to noise).
  • There is no "Accessibility Overlays" category in Wappalyzer but looking at accessiBe as a well-known one, it does seem to improve the Accessibility score for sites that use that at the lower percentiles (the easy wins?). As well as the other criticism the Accessibility community has for these, I think there's a genuine question if these are worth it if it only increased the 50th percentile by a single Lighthouse point? It should be noted though, that at 417 sites it's a VERY small sample size so not sure how much we can really read into that. Another interesting point though is even at 99th percentile they don't hit the top Lighthouse Accessibility score (which I honestly think is quite an achievable score!)
  • Disappointingly there seems to be no correlation between site popularity and Lighthouse Accessibility score 😢 I had hoped that more popular sites, presumably with more resources to look after their website would have better scores. If anything the opposite appears to be true! Personally I think that in of itself is an interesting stat!

Of course it should be remembered that Lighthouse Accessibility checks are limited and a high score does not indicate a site is accessible – though I do usually find the opposite is true (i.e. a low score indicates a site is usually at least partially inaccessible), so think there is still value in looking at this as a broad indicator of how accessible/inaccessible a website is when dealing at the scale we deal with.

Anyway, satisfied my own curiosity and so will bow out again now and leave the chapter team to decide if they want to include any of this type of info in the chapter.

SQL Query below. It uses 3TB at a cost of $15 and takes a good 15 mins to run! - I'm sure it can be improved but just something I knocked together to see if this was worth exploring further.

standardSQL

CREATE TEMPORARY FUNCTION usesAriaAttributes(payload STRING)

RETURNS BOOL LANGUAGE js AS ''' try { const almanac = JSON.parse(payload); const containsAria = (element) => element.includes('aria') === true; return Object.keys(almanac.attributes_used_on_elements).some(containsAria) } catch (e) { return false } ''';

WITH lighthouse_scores AS (

SELECT url,

CAST(JSON_EXTRACT(report, '$.categories.accessibility.score') AS NUMERIC) AS accessibility

FROM

#`httparchive.sample_data.lighthouse_mobile_10k`

`httparchive.lighthouse.2021_05_01_mobile`

WHERE JSON_EXTRACT(report, '$.categories.accessibility.score') IS NOT NULL

),

all_sites AS (

SELECT

COUNT(0) AS all_sites_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS all_sites_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

uses_aria AS (

SELECT

COUNT(0) AS uses_aria_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS uses_aria_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  #`httparchive.sample_data.pages_mobile_10k`

  `httparchive.pages.2021_05_01_mobile`

USING (url)

WHERE

  usesAriaAttributes(JSON_EXTRACT_SCALAR(payload, '$._almanac'))

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

uses_accessibe AS (

SELECT

COUNT(0) AS uses_accessibe_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS uses_accessibe_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  #`httparchive.sample_data.technologies_mobile_10k`

  `httparchive.technologies.2021_05_01_mobile`

USING (url)

WHERE

  category = 'Accessibility' AND

  APP = 'AccessiBe'

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

ranking AS (

SELECT DISTINCT

origin || '/' AS url,

experimental.popularity.rank AS rank

FROM

`chrome-ux-report.all.202105`

),

top1k AS (

SELECT

COUNT(0) AS top1k_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top1k_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  ranking

USING (url)

WHERE

  rank <= 1000

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

top10k AS (

SELECT

COUNT(0) AS top10k_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top10k_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  ranking

USING (url)

WHERE

  rank > 1000 AND

  rank <= 10000

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

top100k AS (

SELECT

COUNT(0) AS top100k_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top100k_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  ranking

USING (url)

WHERE

  rank > 10000 AND

  rank <= 100000

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

top1m AS (

SELECT

COUNT(0) AS top1m_num_sites,

percentile,

APPROX_QUANTILES(accessibility, 1000)[OFFSET(percentile * 10)] AS top1m_score

FROM (

SELECT

  accessibility

FROM

  lighthouse_scores

JOIN

  ranking

USING (url)

WHERE

  rank > 100000 AND

  rank <= 1000000

),

UNNEST([10, 25, 50, 75, 90, 95, 99]) AS percentile

GROUP BY

percentile

),

results AS ( SELECT

all_sites_num_sites,

uses_aria_num_sites,

uses_accessibe_num_sites,

top1k_num_sites,

top10k_num_sites,

top100k_num_sites,

top1m_num_sites,

percentile,

all_sites_score,

uses_aria_score,

uses_accessibe_score,

top1k_score,

top10k_score,

top100k_score,

top1m_score FROM

all_sites JOIN

uses_aria

USING

(percentile) JOIN

uses_accessibe

USING

(percentile) JOIN

top1k

USING

(percentile) JOIN

top10k

USING

(percentile) JOIN

top100k

USING

(percentile) JOIN

top1m

USING

(percentile)

)

SELECT

'num_sites' as category,

NULL as percentile,

MAX(all_sites_num_sites) AS all_sites,

MAX(uses_aria_num_sites) AS uses_aria,

MAX(uses_accessibe_num_sites) AS accessibe,

MAX(top1k_num_sites) AS top1k,

MAX(top10k_num_sites) AS tok10k,

MAX(top100k_num_sites) AS top100k,

MAX(top1m_num_sites) AS tok1m FROM

results UNION ALL SELECT

'num_sites_pct' as category,

NULL as percentile,

MAX(all_sites_num_sites)/MAX(all_sites_num_sites),

MAX(uses_aria_num_sites)/MAX(all_sites_num_sites),

MAX(uses_accessibe_num_sites)/MAX(all_sites_num_sites),

MAX(top1k_num_sites)/MAX(all_sites_num_sites),

MAX(top10k_num_sites)/MAX(all_sites_num_sites),

MAX(top100k_num_sites)/MAX(all_sites_num_sites),

MAX(top1m_num_sites)/MAX(all_sites_num_sites) FROM

results UNION ALL SELECT

'percentile' as category,

percentile,

all_sites_score,

uses_aria_score,

uses_accessibe_score,

top1k_score,

top10k_score,

top100k_score,

top1m_score FROM

results ORDER BY

category,

percentile

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HTTPArchive/almanac.httparchive.org/issues/2147#issuecomment-864411604, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARTAXE7BK54SA7A7UZNSK2DTTSQRTANCNFSM43UFMWSA .

tunetheweb commented 3 years ago

Yeah as I say I’m not saying ARIA means it’s good and there’s definitely pitfalls to it and shouldn’t be used unless necessary. I just didn’t like that statement’s implication that it’s bad either (without delving further into why their stats showed that), and thought worth digging into it more to see if we could replicate their findings and dig into why that was the case more.

ARIA is necessary in a lot of cases! But it’s a tool like any other so needs to be used in right way and can be used in wrong way.

But ultimately with these stats I’m not trying to recommend anything - I’m just trying to report on the state of the web and see what it tells us.

Many websites are simple and don’t require ARIA. Many are complicated and so do. And more complicated websites are way more likely to have at least one accessibility issue than simple ones. So in many ways not surprising that usage of ARIA would lead to at least one accessibility issue more often than those sites that don’t use it.

But still I think it’s interesting that 72% of websites have at least one ARIA attribute (way more than I thought would!) and also that, by Lighthouse Accessibility score at least, those websites do tend to look to be more accessible.

alextait1 commented 3 years ago

I'm going to think on this more but I tend to agree with @digitala11ies, I think it's risky business equating the presence ARIA with more or less over-all accessibility as it's so often misused and I wouldn't want anyone to take away "use more ARIA" from our report, especially since that's in conflict with the first rule of ARIA. I do think it's interesting to highlight the high rate of ARIA use, shows that people are at least considering whether they should or leveraging libraries with ARIA incorporated.

digitala11ies commented 3 years ago

Yes - thanks for putting that more eloquently than I could manage, Alex! I think there's still value in talking about it --I just want to be careful in the associations we make.

On Sat, Jun 19, 2021, 11:55 AM Alex Tait @.***> wrote:

I'm going to think on this more but I tend to agree with @digitala11ies https://github.com/digitala11ies, I think it's risky business equating the presence ARIA with more or less over-all accessibility as it's so often misused and I wouldn't want anyone to take away "use more ARIA" from our report, especially since that's in conflict with the first rule of ARIA https://www.w3.org/TR/using-aria/#rule1. I do think it's interesting to highlight the high rate of ARIA use, shows that people are at least considering whether they should or leveraging libraries with ARIA incorporated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HTTPArchive/almanac.httparchive.org/issues/2147#issuecomment-864423829, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARTAXEYPHQ4ZCNOCZRSQF5DTTS4V5ANCNFSM43UFMWSA .

tunetheweb commented 3 years ago

Yup that's fair enough. Understand your concerns.

But what do you think about slicing and dicing the "accessibility score" in other ways? Or do you have similar concerns about generalising there? I thought the lack of correlation in site popularity was interesting for example.

And I reran the stats for *.gov.uk/ URLS, *.gov/ URLs and any URL with (.gov. or .gov/) in them and certainly looks like the UK and US government sites are doing a better job than the majority of the web (yeaah!):

category percentile all_sites uk_gov us_gov all_gov
num_sites   7,150,239 2,569 13,612 71,744
num_sites_pct   100% 0.036% 0.190% 1.003%
percentile 10 0.61 0.81 0.7 0.6
percentile 25 0.73 0.88 0.81 0.73
percentile 50 0.83 0.96 0.89 0.83
percentile 75 0.91 0.99 0.95 0.91
percentile 90 0.96 1 0.98 0.97
percentile 95 0.97 1 1 0.98
percentile 99 1 1 1 1

Depressingly however, more general .gov websites just mirror the whole dataset so no better than average 😞

Anyway, if you think there's any merit in this approach, then have a think if there's any other sort of slicing and dicing you think we could do here to reveal interesting insights. And, as always, your expertise in adding colour to what any of the stats show is important here.

alextait1 commented 3 years ago

@tunetheweb ooo that's very interesting data about the government sites, I do want to think more on this! Thanks for surfacing these ideas 😎

schachin commented 3 years ago

I was part of the lead team for the GSA back in 2011-2013 -- goal was to add WCAG to all GSA controlled government sites but they had a lot of work to do on just getting them all up to spec first. USA.gov though was the lead site and it was WCAG back then, BEFORE they required it at the Federal level. So the government sites have paid a lot more attention at a Federal level at least.


More Info & Publications

Contact Info

Helping You Make It Better by Making It Work. Recommendations and References available on LinkedIn or by request. Client information is generally protected by NDA and not typically available on public sites.

On Saturday, June 19, 2021, 9:31:58 AM PDT, Alex Tait ***@***.***> wrote:  

@tunetheweb ooo that's very interesting data about the government sites, I do want to think more on this! Thanks for surfacing these ideas 😎

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

foxdavidj commented 3 years ago

@alextait1

Data for all but 4 queries (the ones looking into the CSS) are completed and have their data input in our sheet: https://docs.google.com/spreadsheets/d/1WjAM5ZnHjMQt-rKyHvj2eVhU_WdzzFTjpoYWMr_I0Cw/edit#gid=150155313

Visualizations will be added next week along with comments for how to read the data.

The explanations for 90% of the queries are the same as last years, so please refer to last years sheet for explanations until I'm able to add them directly to our 2021 sheet.

foxdavidj commented 3 years ago

@alextait1 The spreadsheet has been updated with comments. Let me know if you have any questions!

alextait1 commented 3 years ago

@obto thanks so much, I'll be taking a look on Friday this week!

rviscomi commented 3 years ago

@alextait1 @scottdavis99 @oluoluoxenfree @ericwbailey @clottman @shantsis @digitala11ies @obto

🎉 This chapter is fully written, reviewed, edited, and ready to be launched on Wednesday! Thank you to all of the contributors who put in the time and effort to make this a great chapter.

When you get 5 minutes, I'd really appreciate if you could fill out our contributor survey to tell us (the project leads) about your experience. It's super helpful to hear what went well or what could be improved for next time. 🙏

Congratulations and thank you all again. I'm excited for this to launch soon!