GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.27k stars 9.35k forks source link

Request for metrics that are inclusive to Assistive Technology #11049

Open scottjehl opened 4 years ago

scottjehl commented 4 years ago

Feature request summary Per @addyosmani's request, I'm following up with a feature request after today's discussion here, https://twitter.com/scottjehl/status/1278372113716576258 which duplicates Léonie Watson's earlier tweet

It would be useful to know when Assistive Tech is able to interact with and communicate page content so that that timing can be factored into existing metrics that represent page "readiness". Measurements that note or approximate the time when the accessibility tree is built would be interesting to see and understand, and as Léonie's tweet mentions, the time of the first accessibility API query as well.

It'd also be interesting to know which existing metrics are not relevant to Assistive Tech (for example, is FCP happening before the accessibility tree is created due to blocking JS?). Or, in an SSR scenario, is the accessibility tree created initially one way and later "hydrated" into a much different state?

What is the motivation or use case for changing this?

To see metrics that measure when a page is usable to all users, including those using AT.

How is this beneficial to Lighthouse?

Accessibility is part of Lighthouse scoring criteria.

LJWatson commented 4 years ago

One thing that will be interesting is the way different browsers build the Accessibility Tree (AcT) and then handle Acc API calls.

Chrome is a good illustration of the difference these things can make. As I understand it, Chrome used to build the AcT in the content process, then proxy Acc API calls in from the application process. Now it builds the AcT in the content process then caches the entire thing within the application process, where it can be queried via the Acc API.

In the first incarnation there was a performance hit on every Acc API call, but in the current incarnation although there can be a noticeable performance hit whilst the initial AcT is being built and cached (notably on large and/or JS heavy pages) once the cached AcT is available everything thereafter is pretty performant.

Firefox on the other hand proxies Acc API calls from the application process to the AcT in the content process, but it uses intelligent caching to bundle related information into the information that's returned, so the overall number and frequency of API calls is reduced.

Performance can sometimes seem sluggish with an AT in Firefox, but usually only with large and/or JS heavy pages.

The AT itself is also an important part of the puzzle, though likely that will emerge as a measurable and comparable metric if we're able to measure the first time to AcT interaction/first Acc API call. NVDA for example is considerably more performant in Firefox than Jaws is, and that has as much to do with the screen reader itself as the browser.

tkadlec commented 4 years ago

Ultimately, I'd love to see a cross-browser metric (or a few) around the AT for reasons @LJWatson points out—the differences between how different engines create the AT and the implications for how we build are anything but widely known and understood.

But, that's a long and different process. At least having some information available in Lighthouse would start to surface the issue a bit more and provide a starting point.

It's the whole chicken vs an egg problem: I suspect cross-browser metrics will be super helpful here and will highlight opportunities for improvement, but until we have something like this exposed somewhere, it's hard to know to what extent. Lighthouse feels like it could be a good place to start.

scottjehl commented 4 years ago

Just a note that there's a side thread over at webpagetest where Pat has some feedback that could be useful here. https://github.com/WPO-Foundation/webpagetest/issues/1369

patrickhulce commented 4 years ago

cc @anniesullie we'd love to hear what the Chrome Speed Metrics team is working on in this area if there are any plans to do this in the future :)

paulirish commented 4 years ago

The AcT (Accessibility Tree) is a very defined thing and we have decent observability on it. Not as good as the DOM tree, but pretty good. I like the idea of getting a Time To A11y Tree First Built metric. Though keep in mind the tree will be changing as scripts load in and content is added to the page.

Adding another possibility to the brainstorm, I can imagine a metric that considers the how quickly the AcT settles into its "final" position. (defining "final" TBD, much like "fully loaded") It could be computed much like Speed Index, assuming there's a decent calculation for determining tree similarity.


A note on the instrumentation that currently exists:

puppeteer actually has some great work, culminating in the accessibility.snapshot() method. Behind the scenes, it uses Accessibility.getFullAXTree from the devtools protocol, plus some more work to flesh out a solid picture of the AcT.

The protocol (and thus pptr) don't have events that indicate "AccessibilityTreeChanged", so right now in order to understand how it changes, it'd need to be polled. Hopefully what @LJWatson said about the perf hit indicates that polling would be decently performant. Regardless, we're in a lab scenario so no user perceivable impact anyhow. :) If this exploration works out, perhaps some "change" events could be added to the protocol so the approach could be optimized a bit.


I think some prototyping here is the next best step.

With some straightforward puppeteer scripting, someone can make a basic Time To First AcT metric and also explore the AcT Speed Index idea. Once built, there's always a good amount of metric validation necessary to understand how well the numbers we get track the intent of the metric. Testing on a variety of webpages/webapps is key here.

I'm happy to give some guidance if anyone has questions about the protocol underpinnings here.

connorjclark commented 4 years ago

It could be computed much like Speed Index, assuming there's a decent calculation for determining tree similarity.

This is the part that gives me the most pause. This won't be nearly as simple as "sum all the color values". Some cursory googling for "tree similarity algorithms" wasn't encouraging.

anniesullie commented 4 years ago

Do you think something TTI-like would be too noisy? I'm thinking of some kind of settling metric like "time until N seconds between accessibility tree changes"?

It could be prototyped for different values of N and run across a large number of sites multiple times to assess stability.

scottjehl commented 4 years ago

This is a really great reply, @paulirish . Thanks for considering.

I think any visibility in Lighthouse for accessibility tree timing would be great, since it'd help spread awareness of how architectural decisions in page delivery impact AT users' experience. I particularly like the idea of using a "settled" state to represent a sort of "Accessible-Ready" metric in Lighthouse, assuming that represents when things become reliable to use, but I'm not sure if there are parallels here to say, visual rendering, where there are actually earlier moments that are meaningful to AT than when the whole thing is ready. I'll defer to the experts on the particulars there. Excited for progress here.

connorjclark commented 4 years ago

Do you think something TTI-like would be too noisy? I'm thinking of some kind of settling metric like "time until N seconds between accessibility tree changes"?

I think that would devolve in some common cases regarding carousels.

patrickhulce commented 4 years ago

I think that would devolve in some common cases regarding carousels.

Speed Index has this same problem but benefits from the fact that "Visual complete idle" isn't used for the end time. I wonder if TTI itself could be used as the "AcT Complete" snapshot and then the tree similarity magic could walk back from there?

EDIT: Of course we would need to validate that TTI is actually later than AcT complete :)

scottjehl commented 3 years ago

Hi all,

We're working on a roundup including updates on how accessibility incentivization and metrics are evolving, so I was curious if you have any new thoughts on how this metric might fit into Lighthouse.

Even using existing metrics as a proxy, it would be so neat to see lighthouse say something like, “This site’s content first becomes broadly accessible at 11 seconds, over 7 seconds after it appears accessible to sighted users,” or something along those lines.

Thanks Scott

LJWatson commented 3 years ago

I think a simple statement like that would be a huge step in the right direction. It would surface the information, which will also surface the fact that TTI may be different for people depending on their UA stack.

anniesullie commented 3 years ago

Hi @scottjehl and @LJWatson! I work on the team that develops the Core Web Vitals metrics, and @kbae00 is looking into accessibility. We're still in very early stages, so we'd really appreciate feedback. We're currently thinking about starting with some of the audits that Lighthouse already provides, like contrast and img-alt. I'm really curious what you all think of that vs. more performance-focused metrics.

If the roundup could direct people to web-vitals-feedback@googlegroups.com for their thoughts, it would help us a lot with direction!

scottjehl commented 3 years ago

Thanks for your comment, @anniesullie.

I think those audits you mention and the other accessibility-related audits in Lighthouse sound like great factors to consider for Core Web Vitals. For this specific request though, many of us were thinking that something with close overlap to the performance metrics themselves might provide a new way to incentivize improving later (time to interactive ish) metrics in a relatable (and to many, novel) way. Through the help of Google's performance tooling, we've learned to prioritize asset delivery to promote earlier visual rendering, but those early paint metrics are often not meaningful moments at all on the page loading timeline for an assistive tech user. So, alongside the existing visual FCP/LCP metrics, an "Accessible Ready" sort of metric could communicate that discrepancy between ready for some users and ready for everyone. It could give developers another important reason to reduce their TBT in order to expose content to AT sooner.

When we write up our post, we'll be sure to add that group link, thanks!

LJWatson commented 3 years ago

Thanks @anniesullie. I agree with @scottjehl that those factors would be good for Core Web Vitals, but in this case keeping the TTI for AT users within the performance metrics makes sense.

LJWatson commented 2 years ago

Returning to this topic for the first time in a while and hoping there might be some news of progress towards an accessibility/assistive tech related performance metric?

BogdanCerovac commented 1 year ago

Inspired by @LJWatson's talk in Performance.now() I decided to investigate a bit and stumbled upon Accessibility.loadComplete event (still experimental).

The loadComplete event mirrors the load complete event sent by the browser to assistive technology when the web page has finished loading.

Seems like a good starting point for a basic check (?). Will do some quick tests with Puppeteer when time allows, just posting it here for reference (and to inform others about the possibility

BogdanCerovac commented 1 year ago

Had some time on my hands and made a simple proof of concept, in Puppeteer as it was easier for me;

I am a total newbie in regards of the metrics and CDP, just for the sake of the concept I used console.time() and console.timeEnd() (as it's running inside Node). Guess that CDP can have better methodology for the metrics...

Please check the repository with a working proof of concept for loadComplete metrics.

BogdanCerovac commented 11 months ago

After almost a year and Accessibility.loadComplete still experimental, I decided to leave a comment here to keep the issue opened and hopefully prevent it being too stale.

If anybody wonders - last year I made a simple proof of concept using loadComplete working via Puppeteer's CDP integration that may be a starting point to more reliable performance metric.