GoogleChrome / web-vitals

Essential metrics for a healthy site.
https://web.dev/vitals
Apache License 2.0
7.59k stars 415 forks source link

Consider escaping selectors #337

Open brendankenny opened 1 year ago

brendankenny commented 1 year ago

web.dev has a md:hidden-yes CSS class it likes to use these days, which won't work in a selector when you go back to debug using attribution data (e.g. document.querySelector('button.md:hidden-yes') -> Uncaught DOMException: Failed to execute 'querySelector' on 'Document': 'button.md:hidden-yes' is not a valid selector). Illegal characters have to be escaped to function in a selector (so e.g. 'button.md\\:hidden-yes' or button.md\\3Ahidden-yes).

Ideally these characters would be automatically escaped to ease the debugging process. This could be done on the report side, but not every viewer will remember to do this.

In the browser, you can use CSS.escape(), so if getSelector.ts is only ever used in the browser (I believe so since no jsdom?), el.className.trim().replace(/\s+/g, '.') could become [...el.classList].map(CSS.escape).join('.') and call it a day (though probably worth escaping the id/name as well).

tunetheweb commented 1 year ago

Alternatively couldn't we just escape it when you run your querySelector?

document.querySelector(CSS.escape('button.md:hidden-yes'))

Just worried about that escaped sequence causing more confusion...

brendankenny commented 1 year ago

You have to escape the individual parts, otherwise it'll escape the #s and .s as well (CSS.escape('button.md:hidden-yes') gives button\\.md\\:hidden-yes).

It's possible the escaped sequence could cause more confusion, looking for something with class md\\:hidden-yes vs what you authored (md:hidden-yes), but really the selectors you get are more like div.cluster.gutter-base>button.icon-button.tooltip.color-core-text.md:hidden-yes>svg. The most natural thing to do with that is select for the whole thing in the page, not look for individual parts of the selector, and if you do go looking for the individual parts, the \\ is still a pretty good indicator that something has been escaped and may have been altered.

Also, this is just to handle the "illegal" characters, mostly characters used to indicate other things in selectors, like : is usually accessing a pseudo class/element, `,>,#`, etc. Most of the time these characters won't end up in the middle of an id or class name so most will never be affected.

philipwalton commented 1 year ago

Yeah, I remember encountering this exact problem when trying to debug INP issues on web.dev myself, and it stumped me for a bit until I realized the issue with invalid selector. So, at minimum, I think it warrants a mention in the attribution docs because it could potentially be stumping other people as well.

However, I'm hesitant to change the default behavior of the library for a few reasons:

  1. The primary use case of these selectors is to help people visually identify potentially-problematic elements from glancing at a report (grouped based on page URL + selector), and escaped selectors are harder to visually identify (and potentially more confusing) than non-escaped ones. The primary use case was not to make it possible to build automated tooling that would identify these elements on a particular page for site developers. The reason the latter was not a primary use case is (AFAIK) It's not possible to produce a selector that is guaranteed to be unique or match the element in question, so there will always be cases where tooling will fail.

  2. If we were to make this change, I think it would have to be a breaking change because it could lead to a situation where a site owner's analytics data changes without them making any changes to their site. It could also lead to issues with a report's GROUP BY logic because the identifiers would be different before vs. after the update.

My sense is we should do what we can to help people (and tooling authors) avoid this problem by making it clear in the docs, but I'm hesitant change the library due to how it could potentially affect people's analytics data.

brendankenny commented 1 year ago

OK, but if it separately stumped you, me, and @mmocny for a bit, clearly searching in the page is a use case and it's going to cause issues when it happens :)

FWIW I didn't think to search in the web-vitals.js docs to see what was going on—I could see the : in the selector and the : in the class name in the DOM, so I thought the fact that it wasn't matching meant I was somehow holding querySelector wrong and went searching based on that, not how the selector was generated—so even if it had been documented here I wouldn't have found it.

Also remember this isn't all selectors class names, this is just class names with weird whitespace or characters like { in them, so in theory it won't be a common occurrence. The HTTP Archive query to figure out how often that occurs was going to be like $80 so I didn't run it, but maybe @tunetheweb has a spare set of CSS selectors around that we could check.

The point about analytics grouping over time is a good one, so I think it's just a question of weighing the downsides.

My biased viewpoint:

Breaking changes are bad and we should feel bad, but our advice is to go from RUM attribution -> recreate in the lab, so to me at least it feels like we should just rip off the bandaid (for a library steadily growing in usage, the best time to make a breaking change was before you shipped, the second best time is as soon as possible :)

brendankenny commented 1 year ago

Happy to discuss more. Don't worry about it somehow shipping before you're back @philipwalton :)

We could pair the change with making the CSS selector unique by using :nth-child() when it can't disambiguate between siblings using className/id, similar to how the DevTools Elements panel Copy->Copy selector works, though that doesn't eliminate the problem of a selector that only works when the page is in a particular state.