Shopify / shopify-app-bridge

https://shopify.dev/docs/api/app-bridge
86 stars 9 forks source link

Layout shifts caused by App Bridge side bar navigation input events included in CLS scores #194

Closed ascherkus closed 6 months ago

ascherkus commented 1 year ago

Describe the bug

Cumulative Layout Shift (CLS) scores measure visual stability and is used to grade Shopify apps.

According to https://web.dev/cls/ the web-vitals library (which App Bridge uses) filters out layout shift events that had recent user input. The code in onCLS.ts is pretty straight forward here as well: https://github.com/GoogleChrome/web-vitals/blob/main/src/onCLS.ts

// Only count layout shifts without recent user input.
if (!entry.hadRecentInput) {
  // ...
}  

For a single page app (SPA) that navigates between different sections of the app (e.g., by using the App Bridge navigation side bar), this can result in layout shifts due to navigation, but since the user input is in the parent iframe, to web-vitals it doesn't filter it out meaning it can negatively impact the CLS score.

In a hypothetical app that (say) used tabs inside its iframe for app navigation as opposed to the App Bridge navigation side bar, the hadRecentInput flag would be true when a user clicked on a tab to navigate.

To Reproduce

Steps to reproduce the behaviour:

  1. Add the following layout shift logging code:
    new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
    console.log(entry);
    }
    }).observe({ type: "layout-shift", buffered: true });
  2. Have a single-page app that uses App Bridge navigation with different sections that may have some layout shifts on component mount
  3. Navigate between different sections of the app using the side bar (i.e., don't click within the iframe)
  4. Notice that the layout shifts being logged have hadRecentInput set to false

When the onCLS callback from web-vitals is fired, the entries included are ones that were generated by side bar navigation.

Expected behaviour

Ideally, App Bridge's use of web-vitals could somehow indicate / filter out those layout shift events from being included in the CLS score.

Contextual information

Packages and versions

List the relevant packages you’re using, and their versions. For example:

Platform

Additional context

I'm aware this uniquely affects SPAs as an MPA with sidebar navigation would potentially be a new page load, and hence a new set of web-vitals scores.

Here are some issues / articles I found regarding SPAs and measuring CLS: https://github.com/GoogleChrome/web-vitals/issues/119 https://web.dev/vitals-spa-faq/

bratsos commented 1 year ago

Hi @ascherkus ! I'm sorry but I'm not sure I understood what the issue here is.

CLS measured by App Bridge is isolated from the rest of Shopify's Admin because your app lives in an iframe. The way PerformanceObserver works is that it starts measuring timings from the moment of the creation of its browsing context, and the browsing context of the app is different from the browsing context of the Admin.

Regarding your reproduce steps, where do you use your snippet? Is it in the app code (inside the iframe) or is it in the top frame?

Thanks!

ascherkus commented 1 year ago

Hey @bratsos! Thanks for the reply -- I'll try and do my best to describe what we're seeing here.

My understanding is the following:

I've also confirmed that if our own embedded app uses the same web-vitals library and callbacks (e.g., onCLS) that we see the exact same value that gets reported to Shopify via App Bridge.

The above makes total sense and isn't the part where we're seeing anything problematic.

The issue is with how web-vitals computes the CLS score based on events emitted by PerformanceObserver which we feel isn't "fair" for Single Page Apps that don't reload the page on sidebar navigation clicks. This is since the embedded app iframe is unable to "know" that layout shifts its PerformanceObserver detected are caused by user input that occured in the parent Shopify iframe (i.e., clicking on the navigation side bar, which sends a APP::NAVIGATION::REDIRECT::APP message to the embedded app's iframe).

If you have an embedded app that causes layout shifts as a result of APP::NAVIGATION::REDIRECT::APP messages (e.g., navigating to a different part of your application), and log the PerformanceObserver events that get emitted (i.e., the same events that web-vitals uses to compute CLS, which gets reported by App Bridge, and logged by Shopify):

new PerformanceObserver((entryList) => {
  for (const entry of entryList.getEntries()) {
    console.log(entry);
  }
}).observe({ type: "layout-shift", buffered: true });

... you'll notice that hadRecentInput is false, which means any layout shifts are negatively counted towards our CLS score.

If our application had used in-frame tabs instead of Shopify's navigation sidebar, hadRecentInput would be true and our CLS scores would be lower as result... however using tabs for primary navigation is discouraged by Shopify's guidelines: https://polaris.shopify.com/components/navigation/legacy-tabs https://polaris.shopify.com/components/navigation/tabs

I hope that clarifies the issue and if not I'm happy to provide more sample code, screenshots, logs, etc...!

Thanks!

ascherkus commented 1 year ago

Just to follow up here -- we ran an experiment on a subset of our users last week where they saw in-frame tab navigation instead of Shopify App Bridge navigation.

The following are the CLS metrics we observed as a result:

   nav     |  n   |  avg  |  p25  |  p50  |  p75  |  p90
-----------+------+-------+-------+-------+-------+-------
 appbridge | 1643 | 0.086 | 0.001 | 0.030 | 0.107 | 0.229
 tabs      | 1426 | 0.060 | 0.000 | 0.006 | 0.066 | 0.127

It seems pretty conclusive to us that using Shopify App Bridge navigation for single page apps is negatively impacting CLS scores due to PerformanceObserver being unaware that the layout shifts it measures are caused by out-of-frame user input events.

This is unfortunate for Shopify App Bridge developers that use React or similar Single Page App frameworks as it seems like it incentivizes developers to use a Multi Page App approach (or other approaches like intentionally breaking the Shopify style guide to use in-frame tabs) to "workaround" the CLS issue. However, doing so comes at the expense of overall worse user experience.

For example, Shopify mandates a LCP of 2.5 seconds or less 75% of the time [1]. If we went with the Multi Page App approach to "workaround" our CLS score while maintaining a ~1-2 second LCP value we believe it'd be an overall worse user experience as navigating inside our app would take 1-2 seconds as opposed to the instant navigation users have today with our Single Page App approach.

Would love to hear your thoughts and recommendations as we'd love to apply for the Built For Shopify program but we're hung up on this particular issue. Thanks!

[1] https://shopify.dev/docs/apps/best-practices/performance/admin#largest-contentful-paint

bratsos commented 1 year ago

Hey @ascherkus! Thanks for following up, and I apologize for my delayed response. I'll do my best to set aside some time this week to go through this and attempt to reproduce it locally. Meanwhile, I'd appreciate it if you could provide more information on the following:

Looking forward to hearing from you!

ascherkus commented 1 year ago

Thanks for looking into this @bratsos!

We use web-vitals' onCLS() to log all metrics emitted [1]. We don't set reportAllChanges but as per the documentation, onCLS() can potentailly fire each time the page visibility changes to hidden.

Based on our debugging of App Bridge and using web-vitals inside our app, it seems like onCLS() fires and App Bridge recieves a APP::WEB_VITALS::CUMULATIVE_LAYOUT_SHIFT message at the same time with the same value. I'm not sure how Shopify deals with multiple CLS metrics but I'll defer to you there!

Regarding "per session" -- are you referring to per unique app load (i.e., the timestamp query parameter, which is unique per app load) or per Shopify session (i.e., the session query parameter, which seems to stay constant between app loads at least until you log in again), or something else?

In any case, I re-ran our metrics by computing percentiles over all metrics, the first CLS metric per app load (based on timestamp), and the last CLS metric app load (again, based on timestamp) and the results are more or less the same:

Percentiles Based On All CLS Values
    nav    |   n   |  avg  |  p25  |  p50  |  p75  |  p90  
-----------+-------+-------+-------+-------+-------+-------
 appbridge | 15420 | 0.129 | 0.004 | 0.092 | 0.181 | 0.277
 tabs      |  1757 | 0.061 | 0.000 | 0.006 | 0.065 | 0.128

Percentiles Based On First Reported CLS Value
    nav    |   n   |  avg  |  p25  |  p50  |  p75  |  p90  
-----------+-------+-------+-------+-------+-------+-------
 appbridge | 12815 | 0.115 | 0.003 | 0.087 | 0.174 | 0.267
 tabs      |  1536 | 0.048 | 0.000 | 0.004 | 0.056 | 0.112

Percentiles Based On Last Reported CLS Value
    nav    |   n   |  avg  |  p25  |  p50  |  p75  |  p90  
-----------+-------+-------+-------+-------+-------+-------
 appbridge | 12815 | 0.142 | 0.018 | 0.106 | 0.189 | 0.283
 tabs      |  1536 | 0.064 | 0.000 | 0.010 | 0.070 | 0.141

I'll message you regarding our app name on Slack.

In terms of reproducing I find it's easiest to trigger CLS by switching tabs to hide the page, which as mentioned above onCLS() will fire based on page visibility changing to hidden.

[1] https://github.com/GoogleChrome/web-vitals/blob/main/src/onCLS.ts

dev-coderise commented 10 months ago

@ascherkus were you able to get around this issue and achieve “Built for Shopify” for your app?

We are hitting the same issue with our SPA app on Shopify. Our LCP is over 2.5s and we believe this bug is the underlying issue as we have already implemented other best practices around pre loading, compressing JS/CSS, and CDN.

TheSecurityDev commented 6 months ago

Has this been solved with the CDN app bridge?

henrytao-me commented 6 months ago

@TheSecurityDev No, it hasn't yet. We hope to solve it this week.

henrytao-me commented 6 months ago

Hi @ascherkus @TheSecurityDev we just pushed a fix in App Bridge CDN. May I have your appId to try it out? Action items:

TheSecurityDev commented 6 months ago

@henrytao-me Client ID: 871c3d46ef8cd555d2df23ede27a15ad

henrytao-me commented 6 months ago

@TheSecurityDev I turned it on for your app. Looks good so far. Let's observe in a week. 🙇

TheSecurityDev commented 6 months ago

Thanks! 🚀

henrytao-me commented 6 months ago

@TheSecurityDev from what I see, this fixes might not make much difference in your app. So, we could expect the metrics are still stable as-is.

henrytao-me commented 6 months ago

@TheSecurityDev It's because your app is not affected by CLS score issue

TheSecurityDev commented 6 months ago

@TheSecurityDev It's because your app is not affected by CLS score issue

I don't know what you're seeing on your end, but this is what I've been seeing: image

But the last one shown is from March 20, which was also when I upgraded to App Bridge v4.

henrytao-me commented 6 months ago

Your app always has CLS as 1 even with the raw PerformanceObserver. So, that's something you need to investigate.

TheSecurityDev commented 6 months ago

Your app always has CLS as 1 even with the raw PerformanceObserver. So, that's something you need to investigate.

I spent over a month last year investigating this issue before I just gave up. The app is randomly reporting that the entire iframe has shifted, which results in a CLS of 1. I'm using NextJS, and I know of one other person with the exact same issue on NextJS, so I don't think it's just me.

Also I'm not exactly sure what a raw PerformanceObserver has to do with this issue here. I was under the impression that the issue is caused by the App Bridge NavMenu and TitleBar actions being triggered from outside the iframe, and so the App Bridge script inside the frame thinks there was no input associated with the action. So if the action redirects to a new page, it will report the resulting layout shift since it thinks the redirect happened without any user input. A simple fix that comes to mind would be to have it skip reporting any layout shifts if there was a recent action on any App Bridge component that's rendered outside the iframe, like the NavMenu or TitleBar (or even Modal), since you might redirect as a result of that action.

This could also just be an issue with SPA apps like NextJS. I see there's an open issue about a similar thing on the web-vitals library. It looks like there's a draft PR aimed at trying to solve this issue, although I don't know how relevant it is to this issue.

henrytao-me commented 6 months ago

The app is randomly reporting that the entire iframe has shifted, which results in a CLS of 1. I'm using NextJS, and I know of one other person with the exact same issue on NextJS, so I don't think it's just me.

I am taking a look of NextJS setup now. I will let you know if I find anything that you can apply to your app.

Also I'm not exactly sure what a raw PerformanceObserver has to do with this issue here.

What I mean is that we are using this web-vitals library. It emits your CLS of 1, even for initial load. So, there is nothing App Bridge can do about it. It should be fixed within your NextJS setup. I can dig into it a bit as I said above.

I was under the impression that the issue is caused by the App Bridge NavMenu and TitleBar actions being triggered from outside the iframe

I don't think this is the case because other apps don't have this issue. WebVitals between top frame and iframe are separated.

This could also just be an issue with SPA apps like NextJS. I see there's https://github.com/GoogleChrome/web-vitals/issues/119 about a similar thing on the web-vitals library. It looks like there's https://github.com/GoogleChrome/web-vitals/pull/308 aimed at trying to solve this issue, although I don't know how relevant it is to this issue.

I looked into all of these too. The way the new fix works is that it will stop collecting and report WebVitals after a side nav is click (or Redirect App AB action is dispatched if you are using ABv3).

Action items:

dev-coderise commented 6 months ago

I don't think this is the case because other apps don't have this issue. WebVitals between top frame and iframe are separated.

We have been experiencing similar issues with our React/Vue.js SPA, and for this reason, we have not been able to achieve the "Built for Shopify" badge for our app. Additionally, we are unable to meet the LCP requirement of 2.5 seconds.

image (21)

cls

Our app ID is 3932641, and we have almost given up on resolving this issue. Sometimes, it feels as though preference is given to simpler apps that receive such badges, and developers who have implemented deep integrations and followed best practices, like single-page applications (SPA), are being penalized.

Thanks for looking into it.

henrytao-me commented 6 months ago

@dev-coderise please add App Bridge CDN to your app for this fix to work. Ref https://shopify.dev/docs/api/app-bridge-library#getting-started

henrytao-me commented 6 months ago

@TheSecurityDev your LCP looks good now.

We will rollout the fix for webvitals tomorrow morning at 9am EST.

Note that: only apps using App Bridge CDN haves this fix. You just need to add App Bridge CDN script tag to your app. There is no need to migrate or remove existing App Bridge from npm (although we recommend to migrate your apps over to use CDN version). See https://shopify.dev/docs/api/app-bridge-library#getting-started

I close this issue for now. 🙇