aws-observability / aws-rum-web

Amazon CloudWatch RUM Web Client
Apache License 2.0
114 stars 65 forks source link

feat: link lcp attribution to image resource and navigation page load #448

Closed williazz closed 9 months ago

williazz commented 10 months ago

Summary

LCP can be attributed to other RumEvents.

  1. If LCP resource is an image, then a site owners want to know how that image was loaded
  2. Site owners want to know if LCP is blocked by the page load sequence

Revision 1

  1. Link LCP Attribution to ResourceEvent eventId for image load, if any
  2. Link LCP Attribution to NavigationEntry eventId of page load, if any

Two callouts in revision 1

  1. Caught a bug where EventBus was instantiated twice
  2. Populate resourceEvent with startTime

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

ps863 commented 9 months ago

comment: I think the tests here need an update based on these changes

williazz commented 9 months ago

comment: I think the tests here need an update based on these changes

It was still in draft state, but made ready for review after adding unit tests.

williazz commented 9 months ago

Problem

There is something about TestCafe that prevents the ResourcePlugin from recording the LCP resource's image. This prevents me from testing the event linking for LCPAttribution.lcpResourceEntry. However, this works every time during manual testing with npm run server

My setup:

// fs
    app/
       ...
       web_vital_event.html
       assets/
           lcp.jpg
// web_vital_event.html
... body {
    ...
    <img src="assets/lcp.jpg" />
}...
// loader-web-vital.js
...
loader('cwr', 'abc123', '1.0', 'us-west-2', './rum_javascript_telemetry.js', {
    dispatchInterval: 0,
    metaDataPluginsToLoad: [],
    eventPluginsToLoad: [
        new ResourcePlugin(),
        new NavigationPlugin(),
        new WebVitalsPlugin()
    ],
    telemetries: [],
    clientBuilder: showRequestClientBuilder
});
...
// WebVitalsPlugin.test.ts (integ)
test('when lcp image resource is recorded then it is attributed to lcp', async (t: TestController) => {
    const browser = t.browser.name;
    if (browser === 'Safari' || browser === 'Firefox') {
        return 'Test is skipped';
    }
    await t.wait(300);

    await t
        // Interact with page to trigger lcp event
        .click(testButton)
        .click(makePageHidden)
        .expect(RESPONSE_STATUS.textContent)
        .eql(STATUS_202.toString())
        .expect(REQUEST_BODY.textContent)
        .contains('BatchId');

    const events = JSON.parse(await REQUEST_BODY.textContent).RumEvents;
    // console.log(events);
    const lcp = events.filter(
        (x: { type: string }) => x.type === LCP_EVENT_TYPE
    )[0];
    const resource = events.filter(
        (x: { details: string; type: string }) =>
            x.type === PERFORMANCE_RESOURCE_EVENT_TYPE &&
            x.details.includes('lcp.jpg')
    )[0];
    await t.expect(lcp.details).contains(`"lcpResourceEntry":"${resource.id}"`);
    // NOTE: lcp.details does include the correct targetUrl
    // Therefore, the PerformanceAPI is working under the hood, but not the ResourcePlugin
});
williazz commented 9 months ago

There is something about TestCafe

This was fixed by upgrading test cafe XD

williazz commented 9 months ago

Need to investigate these failing integ tests on windows. They seem very unrelated

  1. When custom attribute set at init, custom attribute recorded in event metadata
  2. When a browser is not supported then the command function is a no-op
  3. When cookie is disabled, sessionManager records events using member variables
williazz commented 9 months ago

For commit "fix: switch order", the problem is that there are no resource events for resources that are already cached. So the resource event test had to be put first.

In the future,

  1. TestCafe tests might be able to to disable caching https://testcafe.io/documentation/402736/reference/test-api/test/disablepagecaching
  2. ResourcePlugin deserves a discussion on whether events should be made for cached resources, since it might be useful for site owners to know that assets are being cached
williazz commented 9 months ago

This is an anti-pattern, but I have included #451 in this PR because it is required to pass the integ tests. Please merge #451 before merging #448

Edit: #451 has been merged

williazz commented 9 months ago

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

williazz commented 9 months ago

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

There are two solutions to this:

  1. Check browser support at runtime.
  2. After 10 seconds, cleanup cache and stop listening for LCP resource candidates.

(1) is explicitly not recommended by by the owners of web-vitals https://github.com/GoogleChrome/web-vitals/issues/37#issuecomment-628916043.

If we implemented (1) anyways, we would check support for PerformanceObserver + PerformancePaintTiming. Some additional analysis is needed to see what other dependencies the web-vitals requires. In the worst case, we never cleanup resourceEventIds and cause a small memory leak. If web-vitals package implements a polyfill for LCP in the future, this would have to be updated to avoid data loss.

(2) looks janky but is the simplest solution and my recommendation. The benefit is that we do not have to worry about many edge cases or do any maintenance work. The risk is that we are setting a hard limit--if it takes longer than ~10 seconds to report LCP, then we do not attribute lcpResourceEntry

ps863 commented 9 months ago

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

There are two solutions to this:

  1. Check browser support at runtime.
  2. After 10 seconds, cleanup cache and stop listening for LCP resource candidates.

(1) is explicitly not recommended by by the owners of web-vitals GoogleChrome/web-vitals#37 (comment).

If we implemented (1) anyways, we would check support for PerformanceObserver + PerformancePaintTiming. Some additional analysis is needed to see what other dependencies the web-vitals requires. In the worst case, we never cleanup resourceEventIds and cause a small memory leak. If web-vitals package implements a polyfill for LCP in the future, this would have to be updated to avoid data loss.

(2) looks janky but is the simplest solution and my recommendation. The benefit is that we do not have to worry about many edge cases or do any maintenance work. The risk is that we are setting a hard limit--if it takes longer than ~10 seconds to report LCP, then we do not attribute lcpResourceEntry

I think approach 1 has other issues as well:

Question on approach 2:

williazz commented 9 months ago

Follow up on offline conversation...

Why 10 seconds? Is this based on a recommendation? What if the time window were smaller?

10 sec is an arbitrary duration where LCP has probably been recorded already, if supported. We could do another time.

We ultimately opted for (1), and should keep an eye on updates to LargestContentfulPaint browser support and web-vitals.