Add ability to specify selectorMaxLen for attribution build

tunetheweb commented 1 year ago

Fixes #312

tunetheweb commented 1 year ago

Since you commented on that other issue what's your thoughts on this @philipwalton ?

Like that issue, I don't see this as something that a lot of people will use, but it's semi-configurable at present (getSelector takes an optional maxLen parameter but it's not ever sent that as not plumbed all the way through), and this only affects the attribution build so the non-attribution build still stays as small as possible.

philipwalton commented 1 year ago

One concern with adding a selectorMaxLen option is that it doesn't fully address the use case, i.e. there could be cases where it would make more sense for a particular site to filter out particular class names or even use attributes or something other than classes to create the selector, and if so then that person would still need to roll their own getSelector() function (which it seems like Tony was already doing).

Arguably the 100-character limit was arbitrary and we could have kept it unbounded, but I hesitate to change it now since it could break folks currently sending this data to GA.

Another concern I have with selectorMaxLen is it changes the API between the standard and attribution builds. I don't think that will actually cause problems for anyone, but it does add documentation (and potentially typing) complexity.

tunetheweb commented 1 year ago

Yeah I’m not totally convinced myself. The trim() issues seems more compelling (even if it does point to poor HTML authoring) hence why I merged that separately to this.

But it did strike me as odd that it was partially implemented which kinda persuaded me to plumb it through fully. And especially since it only affected the attribution build.

I don’t think it really changes the API for the non attribution and attribution builds though since it’s just an extra optional ReportOpt rather than an additional parameter?

philipwalton commented 1 year ago

Yeah I’m not totally convinced myself. The trim() issues seems more compelling (even if it does point to poor HTML authoring) hence why I merged that separately to this.

Agree that was important to fix.

I don’t think it really changes the API for the non attribution and attribution builds though since it’s just an extra optional ReportOpt rather than an additional parameter?

It changes the API in the sense that the ReportOpts type would need to be updated to have a standard and attribution version. You could keep it the same for both builds, but then you'd have to document that the selectorMaxLen option only works in the attribution build, and it might confuse people using the standard version if their IDE shows an option that doesn't do anything for them.

I'm leaning toward thinking we don't need to add this option, but I'm happy to have you make the final call.

tunetheweb commented 1 year ago

I think I agree. Especially there's a simple workaround of using the entries anyway for those that want to extend it.

I wonder if we should remove the maxLen argument from getSelector then since it's not used? https://github.com/GoogleChrome/web-vitals/blob/9664b8b345862222ffaa920e5f1cb3c05e6bc0b6/src/lib/getSelector.ts#L24

@Tiggerito any further info you wanna give to persuade us to allow this to be configurable?

philipwalton commented 1 year ago

I wonder if we should remove the maxLen argument from getSelector then since it's not used?

Sure, or make it a module-level constant (so it's not a "magic" number).

IIRC that's only there because it was there in the original getSelector() function that was documented on here: https://web.dev/debug-performance-in-the-field/

Tiggerito commented 1 year ago

Something I noticed with the current getSelector function is that it seems to be quite smart. Smarter than the one previously suggested. If it reaches the limit, it will choose to include the deepest parts first and not chop a part up. This means even with truncation, it has a good chance of still working.

With the previously suggested code, it truncated down to the left 100 characters and would typically break the selector if it did that.

So my doubling up to 200 characters is probably less significant now.

I've basically copied the current getSelector and hard-coded the limit to 200, which is the limit in my case (two GA4 properties). I then have a bit of manual code, so I can add my own selector result to a metric:

// find the node in all cases
entry=attribution.largestShiftSource||attribution.lcpEntry||attribution.eventEntry;
webVitalsMetric.debugNode=entry&&(entry.node||entry.element||entry.target);
// calculate the selector
webVitalsMetric.debugTarget=myGetSelector(webVitalsMetric.debugNode);

Having it as an option when setting up an onEvent like proposed would be less code and make the functions options usable.

Another option could be to support as getSelector function in the metric object returned. That way, a consumer could get the selector based on its own limits.

I'm also experimenting with trying to override the chosen node. e.g. pick a different CLS node out of the candidates. This may mean I would need to recalculate debugTarget anyhow. Or find a way to override how CLS chooses its node. This is early playing, and I may find it is of no value!

philipwalton commented 1 year ago

Something I noticed with the current getSelector function is that it seems to be quite smart. Smarter than the one previously suggested. If it reaches the limit, it will choose to include the deepest parts first and not chop a part up. This means even with truncation, it has a good chance of still working.

With the previously suggested code, it truncated down to the left 100 characters and would typically break the selector if it did that.

The getSelector() function always went from right to left, starting from the element and working up the DOM tree to the document. You can refer to the very first implementation here: https://github.com/GoogleChrome/web.dev/blob/7198d64a1b9b91fb44a116e1409dc8a76f1aa0b3/src/site/content/en/blog/debug-web-vitals-in-the-field/index.md

Note that there was never any guarantee that the selector could uniquely identify the element (that would be impossible), but it should always match the element if passed to document.querySelectorAll().

So in that sense it won't necessarily be the case that allowing longer selectors will necessarily result in a more targeted match. In many cases it won't (e.g. consider a case where the body element has tons of classes).

I've basically copied the current getSelector and hard-coded the limit to 200, which is the limit in my case (two GA4 properties). I then have a bit of manual code, so I can add my own selector result to a metric:

I think this is fine, and the approach I'd recommend people do if the default selector doesn't fit their needs. The only way we could make the API flexible enough to meet everyone's needs is to allow passing a custom getSelector() function, but at that point the library isn't doing anything for you.

Tiggerito commented 1 year ago

The getSelector() function always went from right to left, starting from the element and working up the DOM tree to the document. You can refer to the very first implementation here: https://github.com/GoogleChrome/web.dev/blob/7198d64a1b9b91fb44a116e1409dc8a76f1aa0b3/src/site/content/en/blog/debug-web-vitals-in-the-field/index.md

Your right. I was basing my code on that. The issue was that the limit was not passed to the function so the full selector was generated, and GA4 only sent the first 100 characters.

I suspect if I had changed it to limit to 100, I would have never noticed an issue and would not have implemented my hack to allow 200 characters.

Now it defaults to 100, I think most will already have a working solution. This is because I did not notice an issue when I upgraded and introduced the 100 limit. 👍

I may simplify my code and go back to the 100 limit.

philipwalton commented 1 year ago

Your right. I was basing my code on that. The issue was that the limit was not passed to the function so the full selector was generated, and GA4 only sent the first 100 characters.

Ahh, that makes sense.

I suspect if I had changed it to limit to 100, I would have never noticed an issue and would not have implemented my hack to allow 200 characters.

Now it defaults to 100, I think most will already have a working solution. This is because I did not notice an issue when I upgraded and introduced the 100 limit. 👍

I may simplify my code and go back to the 100 limit.

👍 Glad to hear the new version is "just working" better :)

tunetheweb commented 1 year ago

OK closing this and the related issue.

GoogleChrome / web-vitals

Add ability to specify selectorMaxLen for attribution build #329