Automattic / wp-calypso

The JavaScript and API powered WordPress.com

https://developer.wordpress.com

GNU General Public License v2.0

12.43k stars 1.99k forks source link

Stepper: `calypso_signup_start` event can be triggered multiple times, despite tests saying otherwise #94175

Closed chriskmnds closed 1 month ago

chriskmnds commented 2 months ago

Details

In Stepper, the calypso_signup_start event is supposed to be recorded once per signup flow. The logic for handling the tracking lives in use-sign-up-start-tracking, with accompanying tests.

One of the tests specifically tests for "does not trigger the event on rerender", which is kinda odd seeing that useSignupStartEventProps returns a new reference { extra: 'props' } on every rerender, and there's no logic for blocking or ignoring that dependency in useSignUpStartTracking.

it( 'does not trigger the event on rerender', () => {
    const { rerender } = render( {
        flow: { ...signUpFlow, useSignupStartEventProps: () => ( { extra: 'props' } ) }, // HERE
        currentStepRoute: 'step-1',
        queryParams: { ref: 'another-flow-or-cta' },
    } );

    rerender();

    expect( recordTracksEvent ).toHaveBeenNthCalledWith( 1, 'calypso_signup_start', {
        flow: 'sign-up-flow',
        ref: 'another-flow-or-cta',
        extra: 'props',
    } );
} );

The test is misleading/wrong. toHaveBeenNthCalledWith checks that we have that set of arguments on the "Nth" call. We have more than one calls here though.

Here's a PR where I've updated the tests to show the true number of times the event gets recorded (it's 2, and 3 if we also pass signup param): https://github.com/Automattic/wp-calypso/pull/94174

cc @Automattic/dotcom-stepper

Checklist

[ ] Ensure we record calypso_signup_start once

No response

chriskmnds commented 2 months ago

cc @daledupreez @gabrielcaires can you please confirm? If I'm misreading things or you see anything that I don't. Is this on your plate to fix otherwise?

gabrielcaires commented 2 months ago

@chriskmnds It is an issue we should fix. As far as I know, we will only have one calypso_signup_start event. We also have an issue here: https://github.com/Automattic/wp-calypso/issues/93950 to fix it.

Let me know if you want to fix it, or I can fix it. I think you found the root cause.

chriskmnds commented 2 months ago

@gabrielcaires

Let me know if you want to fix it, or I can fix it. I think you found the root cause.

Coming up with a right/wholistic solution seems rather difficult. I've outlined the same issue elsewhere here: https://github.com/Automattic/wp-calypso/pull/93983#discussion_r1740626570

Can you reply there or here with your thoughts? There may other causes, but these are indicative of at least one issue that exists (some state being updated in the hooks being consumed)

gabrielcaires commented 2 months ago

Coming up with the right/holistic solution seems rather difficult.

I totally agree; we have conflicting ideas.

Idea 1: Expectation of having events triggered once by flow Idea 2: React hooks on every render to get customized parameters.

I think the "ideal" solution really involves the concept of a session and enabling a flow to add parameters without necessarily triggering the event.

Something like:

Abandon the hooks focused on add custom props to our tracks.

Adopt a session where the developer can set new props without trigger events and use these new props only when the events are planned to be triggered.


const {session} = useFlowAnalyctisSession();
session.set('anyCustomParams', 'anyValue') //  New Prop will be used by all subsequent track events; 
// or

session.set('anyCustomParams, 'anyValue', 'calypso_signup_start') // New Prop will be used only when stepper trigger thecalypso_signup_start` .



It is something similar Google Analytics does with the user-scoped dimensions.
`
gtag('set', 'user_properties', {
  'user_id': 'USER_ID_VALUE',
  'membership_status': 'MEMBER_STATUS_VALUE'
});
`

https://www.analyticsmania.com/post/user-scoped-custom-dimensions-in-google-analytics-4/

It is just my first thought about the problem. 

What do you think?

chriskmnds commented 1 month ago

@gabrielcaires I like where you are going with this. I think the difficulty lies in the details.

I think the problem is not how we get the data, but when and from where. If the data/props originate from state, then one way or another changes in that state would need to be reflected at the point where we trigger the event (if this is what we want - so, unless we purposefully ignore). This is the main problem (as well as whether we do indeed want the latest state to be recorded always).

add parameters without necessarily triggering the event

I'm not sure if "session" will be any different in this case, but I may not have understood your idea in full. Unless we intend to delay the triggering of recordTracksEvent, then I think the situation is the same. What I mean: the useSignupStartEventProps hook will grab the latest state at the point of calling it in useSignUpStartTracking, in the same way that "session" will be queried for the latest state at that same point. It's what happens after that concerns us: state changing and triggering a rerender through a dependency on useSignupStartEventProps or "session". I don't see much difference.

without trigger events and use these new props only when the events are planned to be triggered.

Ok. If I understand correctly, you may be looking at some form of queuing service? Not sure how we'd solve this. We could develop something like that, but I'm not sure I see where "session" comes in. For example, instead of calling recordTracksEvent, we could timestamp and push the event object in the queue, then on flow/step unmount (or at the right time) pop the last object from the queue and record. Is this more or less how you'd envision this? I think there can be several problems with something this e.g.

This won't fire at the point where we actually want to trigger the event (beginning of step render). We'd need to rely on timestamps recorded with the event - not sure how analytics will work over this.
We'd need to do this for every event that Stepper tracks, otherwise things may turn out to be a mess.

Feel free to clarify with more details. I may have misunderstood something :-)

p.s. I mentioned above:

as well as whether we do indeed want the latest state to be recorded always

I think this is also critical, but may be "overthinking". For example, site may be recorded as undefined at the beginning of step-1 (before the site is created), and get a concrete value at the end of it (after it was created) - and looks fine to my eyes, because "site does not exist at beginning of step-1" (and we'd not want to record it or retrigger the event with it).

gabrielcaires commented 1 month ago

@chriskmnds I think I didn't expose the idea well; let me try to prepare a POC to see how it works.

Before that, I prepared a PR fixing the issue only for this event

94248

chriskmnds commented 1 month ago

@chriskmnds I think I didn't expose the idea well; let me try to prepare a POC to see how it works.

@gabrielcaires Can you not explain with a bit of pseudo code? Just as long as we don't start investing into some implementation that we'll have a hard time backing out of. As you like though - happy to discuss. :-)

Before that, I prepared a PR fixing the issue only for this event #94248

I replied in a review there. I might be missing something, but see no difference from just ognoring the dependencies? I can take a closer look tomorrow, but pls feel free to clarify.

MicBosi commented 1 month ago

Hello @gabrielcaires @chriskmnds 👋

I apologise for the random interjection, I stumbled on this PR while chatting with Travis and Seah about the overall Stepper/better Tracks initiative.

I was struck by:

I totally agree; we have conflicting ideas. Idea 1: Expectation of having events triggered once by flow Idea 2: React hooks on every render to get customized parameters.

I was wondering whether the expectation of having events triggered only once is a hard one or can be relaxed for the next version of Stepper to allow the dust to settle on the various other aspects of the rollout across the various steps that will be replaced?

As a (bit rusty) developer it feels like it might get quite complicated to handle all kinds of corner cases where events should or shouldn't fire twice, which might also depend on the semantics of the specific step/flow, user actions, etc. so I thought I might bring this up in case given all you learnt so far it makes sense to step back and reassess the pros and cons of this particular requirement.

chriskmnds commented 1 month ago

Hello @MicBosi

I was wondering whether the expectation of having events triggered only once is a hard one or can be relaxed for the next version of Stepper to allow the dust to settle on the various other aspects of the rollout across the various steps that will be replaced?

As a (bit rusty) developer it feels like it might get quite complicated to handle all kinds of corner cases where events should or shouldn't fire twice, which might also depend on the semantics of the specific step/flow, user actions, etc. so I thought I might bring this up in case given all you learnt so far it makes sense to step back and reassess the pros and cons of this particular requirement.

This is quite a solid observation and I'd be thinking the same. You summarised it pretty well. I too find it unlikely to be able to control everything and all edge cases.

I'd leave it on the data folks to tell us why the limitation needs to exist. It sounds like something we should be able to control on the analysis side. If we intentionally let events be logged multiple times so that whatever semantics are in place (e.g. async queries or whatever) are allowed to settle on their final values.

@travisw what's your take on the above, what @MicBosi mentioned?

travisw commented 1 month ago

I was wondering whether the expectation of having events triggered only once is a hard one or can be relaxed for the next version of Stepper to allow the dust to settle on the various other aspects of the rollout across the various steps that will be replaced?

@travisw what's your take on the above, what @MicBosi mentioned?

I think it depends on exactly what we're trying to control and when, which I'm not entirely clear about.

If we're trying to ensure a particular event is only fired once at all costs, like even if the page is refreshed or the user clicks the back button, then I don't personally think that's necessary. It could even make the data less intuitive and the flow events more difficult to test.

If we're talking about not firing an event more than once when it seems like it shouldn't actually fire, like on some random step of the flow (like firing a start event after the flow has started), or multiple times for a single step, then I think it's worth trying to minimize that.

For example, imagine this was a simple old-skool, non-JS application, that ran everything synchronously and every page required a full page load. Intuitively, I'd expect it to work similar to this. If you load a page the event fires. When you load the next page the event for that page fires. If you click back the first event fires again. If you refresh the page the event also fires again. Each step just fires the events they're supposed to at the appropriate time. Anything more, like trying to keep track of which events have fired in the context of the entire flow and then not fire them in some cases seems overly complex and not necessary.

chriskmnds commented 1 month ago

@travisw Thanks for the follow-up above. It makes sense.

Stepper is a full React-based framework, and almost always any scenario is simpler when it stays close to the regular life cycles enclosed.

I'll comment with a question on one point from above to help orient myself better:

Each step just fires the events they're supposed to at the appropriate time

If we have a Stepper step firing off the same calypso_signup_step_start event multiple times for the entirety of its lifecycle (meaning, while it is rendered and not unmounted), would that be a problem? You still get the initial event logged (so you know when it was first seen), it may contain incomplete prop values, but you also get a more final record that may contain "more complete" props. So you can ask "has the user seen this step with the site_intent being 'Newsletter'?" (with site intent something you'd get async).

The guarantee is that the step will record everything that exists within its life cycle, and not beyond. The same would hold for calypso_signup_start, calypso_page_view and calypso_signup_flow_start.

I'm just asking - I know it may look weird for analysis if you have calypso_signup_start being recorded again and again while the flow has progressed beyond the starting point.

However, if we relax the condition, then the flow.useEventProps API we've been building will work more naturally/deterministically. Otherwise, things can lead to anti-patterns e.g. a hook that cannot pass or make async calls, or relying on the order of calls in the stack, etc. e.g. this will be ambiguous:

useEventProps() {
    const { data } = useSomeReactQueryHook();
    return {
        'tracks_event': { foo: data?.foo }
    };
}

As anything else based on the internal state in the hook. It's a dreadful design if this hook is ignored from dependencies internally.

travisw commented 1 month ago

Hey @chriskmnds - thanks for the added details. I'm curious what @MicBosi and @skim1220 think about it.

One thing that's on my mind is that in order to get unique instances of events per user, we'll often limit queries to the first occurrence that's fired for a user within some time windows. If that first event has incomplete props then we might not be getting all the details we need. Off the top of my head I'm not sure how we'd determine that a particular record is the final version of the event that might be more complete. Especially if the user refreshes or goes through another flow.

Since we're talking in somewhat abstract terms it's a little hard to tell if this all matters. Maybe the multiple events won't actually be a problem. Do you have a more concrete example of what prop values these multiple events might contain?

Say we're expecting the event and props: calypso_signup_start {flow=onboarding, ref=logged-out-homepage-lp} Let's also say we have a brand new user who visits the LOHP and enter the signup flow from there.

If we get first occurrence of the event for this user, do you think there's a good chance that that record won't have the values for those props that we expect?

At least for the core signup events, we're not so concerned with all the extra props that flows might choose to include in an event. We'd mostly want the common ones like those mentioned above and probably a couple others.

chriskmnds commented 1 month ago

@travisw I think my perspective is roughly summarised in a couple of comments I made elsewhere: https://github.com/Automattic/wp-calypso/pull/94248#discussion_r1760837673

Since we're talking in somewhat abstract terms it's a little hard to tell if this all matters. Maybe the multiple events won't actually be a problem. Do you have a more concrete example of what prop values these multiple events might contain?

I used siteIntent as an example above (in the linked discussion). It could be anything that's in some way "changeable" (either through user progression/interaction, or async data fetching).

useEventProps() {
  const { data } = useSiteIntent( 123 );
  return { 
    calypso_signup_start: {
      site_intent: data?.site_intent,
    } ,
  };
}

The core of the idea though is to "allow" flows to be "silly" - meaning, let flows attach any variable value as a prop to events that are really meant to be momentary (like calypso_signup_start). If we relax the condition that these events should only fire once, then we'll have a pretty decent API e.g. to extend without any cryptic/hidden restrictions (ignoring dependencies).

Say we're expecting the event and props: calypso_signup_start {flow=onboarding, ref=logged-out-homepage-lp} Let's also say we have a brand new user who visits the LOHP and enter the signup flow from there.

If we get first occurrence of the event for this user, do you think there's a good chance that that record won't have the values for those props that we expect?

These will log correctly irrespective of relaxing the restrictions or not. They are mostly either static or immediately compiled when called (no state-changing anywhere on ref or flowName).

I will start a couple of PRs where these will be recorded multiple times:

calypso_signup_start
calypso_signup_step_start, and likely calypso_page_view

The order won't matter / stay the same. So if you target the first instance given a step or flow (however you do that - as you mentioned), then you get exactly same results as now.

cc @Automattic/vertex @gabrielcaires @MicBosi

chriskmnds commented 1 month ago

Closing this as agreed to allow events to be recorded multiple times if that's what the flows define. The direction is summarised in the comment here: https://github.com/Automattic/wp-calypso/issues/94175#issuecomment-2362724556

Automattic / wp-calypso

Stepper: `calypso_signup_start` event can be triggered multiple times, despite tests saying otherwise #94175

Details

Checklist

Related

94248