aws-observability / aws-rum-web

Amazon CloudWatch RUM Web Client
Apache License 2.0
114 stars 65 forks source link

fix: Ignore URL construction error from invalid performance resource event #527

Closed shubhsheth closed 3 months ago

shubhsheth commented 3 months ago

This PR fixes an error originating from recordResourceEvent() method when it tries to process an event that has an invalid name prop. While trying to construct a URL from the string, it fails with Failed to construct 'URL': Invalid URL. This causes the failure from RUM library to be logged as a RUM error event.

Adding a try/catch around the URL construction to silently ignore errors originating from unintended string formats.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

williazz commented 3 months ago

Hey thanks for the contribution. What sort of URL are you failing with?

shubhsheth commented 3 months ago

Hi @williazz, although I have noticed 50-100+ errors being logged for this every day, I have not been able to successfully reproduce the exact steps that lead to this. It's particularly why I was hesitant to create an issue for this.

Based on the error message (Failed to construct 'URL': Invalid URL) the URL string is either falsy or in an invalid format. I am not quite sure which one is the culprit. But in either scenario the URL creation would be causing the RUM error to be logged as an error event.

williazz commented 3 months ago

Based on documentation, name should be a valid URL. But maybe the URL is sometimes a relative URL and missing a base, which can cause that exact error.

When the PerformanceEntry is a PerformanceResourceTiming object, the name property refers to the resolved URL of the requested resource. https://developer.mozilla.org/en-US/docs/Web/API/PerformanceEntry/name

More deep dive is needed to resolve the data loss, but maybe we can release this as a patch. Will discuss offline with team.

williazz commented 3 months ago

@shubhsheth please share the URL patterns that are causing validation errors to help with prioritization, if you are able to identify them.

shubhsheth commented 3 months ago

Patterns such as: null, undefined, "" (empty string), or "/someUrlPath" would fail in this step. I understand that the type checks does not allow non-string values for name. But it is possible that this could be occurring during runtime.

It'd be great if we can merge this patch to mitigate the issue from triggering false events. And then prioritizing a fix for the data loss.

qhanam commented 3 months ago

While suppressing errors is a pattern we use elsewhere, I am hesitant do the same here without first understanding its root cause.

Note that it is possible to configure the web client to ignore these errors.

shubhsheth commented 3 months ago

Can you expand upon why it's not good to suppress this error? - Errors that are being thrown from RUM aren't actually valuable for the product team because it's not surfacing any product issue and there's also nothing they can fix to resolve it.

I looked into ignoring this error, however I couldn't find a good method. The only pattern that was reliable was matching the error message. Other patterns such as filename, line no etc change frequently after new changes are built. The issue with using error message is that it risks ignoring an actual issue that could be occurring from the product code base.

Therefore I was proposing to suppress this from RUM instead of ignoring. I understand that data loss is important to you and you'd want to find the root cause for it. However, the data loss seems to already happen and that wouldn't change with this patch.

qhanam commented 3 months ago

Can you expand upon why it's not good to suppress this error?

How do we know that suppressing the error is the correct behavior? It is difficult to have confidence in this decision without knowing something about the root cause.

Regarding relative URLs, the PerformanceNavigationTiming documentation explicitly states it uses the document.URL property, however the PerformanceResourceTiming documentation is less clear. A brief search suggests that relative resource URLs are expanded.

If we can't identify the root cause, we could:

  1. Suppress the error. The resource will not be recorded when the name is not a valid URL. If users do in fact need this information, we will not know unless someone identifies the problem and creates an issue.
  2. Modify the catch block to throw an error, and include the name (URL) property in the error message. Once we have a better understanding, decide on action.

I think (1) is a reasonable option, but I would like first like to see if we can't find the root cause. Would you be able to make the changes for (2) on your branch and deploy them?

shubhsheth commented 3 months ago

@qhanam Thanks for the explanation, that sounds reasonable. I've updated the try/catch to log a custom error with the name property. Can you please review the changes?

williazz commented 3 months ago

Hey I added a unit test to your PR but accidentally closed this. Could you reopen this PR with the unit tests?

https://github.com/shubhsheth/aws-rum-web/pull/1

shubhsheth commented 3 months ago

@williazz I couldn't re-open this PR, I created a new one: https://github.com/aws-observability/aws-rum-web/pull/532