RCA on #21300 [Bug]: Don't see 'App installed' event when onboarding whether user choose to opt in or opt out metrics - Githubissues

MetaMask / metamask-extension

:globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites

https://metamask.io

Other

12.08k stars 4.94k forks source link

RCA on #21300 [Bug]: Don't see 'App installed' event when onboarding whether user choose to opt in or opt out metrics #23051

Closed benjisclowder closed 9 months ago

benjisclowder commented 9 months ago

What is this about?

Performing a root-cause analysis of this bug as part of this epic - https://github.com/MetaMask/MetaMask-planning/issues/2103.

Data gathered and investigation results will be added in this ticket.

Scenario

No response

Design

No response

Technical Details

No response

Threat Modeling Framework

No response

Acceptance Criteria

No response

Stakeholder review needed before the work gets merged

[ ] Engineering (needed in most cases)
[ ] Design
[ ] Product
[ ] QA (automation tests are required to pass before merging PRs but not all changes are covered by automation tests - please review if QA is needed beyond automation tests)
[ ] Security
[ ] Legal
[ ] Marketing
[ ] Management (please specify)
[ ] Other (please specify)

References

No response

benjisclowder commented 9 months ago

Done in a 1:1 session with @danjm.

Results:

What PR fixed the issue?

https://github.com/MetaMask/metamask-extension/pull/21357

Can you pinpoint the commit from which the issue originated?

8a16141

Write a short explanation of the technical cause of the bug

Loading scripts was delayed by 10 milliseconds
Causing browser.runtime.onInstalled.addListener to be called at least 10 milliseconds after installation
The listener never received an install event and so its callback was never run

Were there any missing unit tests that could have preempted this issue?

I don’t think so, because the bug was dependent on the onInstalled event not being emitted, which is a browser behavior that we would not unit test

Were there any absent end-to-end (e2e) tests that could have averted this issue?

Yes, and those tests were added after the fix in commit 048f606b344

Were there any omitted manual tests that could have intercepted this issue?

This was caught by a manual test during RC testing, but we do not have a documented manual scenario.
However, we no longer need a documented manual scenario for this as we have automated coverage
But we should document some general guidance for manual testers regarding metrics. Perhaps we make it easy for manual testers to set up a test mixpanel, and verify the metrics reported on a subset of the manual scenarios they run through.

Were there any other elements lacking, such as typed code, comprehensive documentation, well-architected APIs, etc., that might have prevented this issue?

If the author or reviewer of the PR had the solution proposed in 6c, they could have caught this, but that would only happen if they did a comprehensive QA before merge, which we can’t expect for every PR.
Probably better documentation of the code in background.js could help

Are there any additional factors that you believe contributed to the occurrence of this bug?

We could benefit from a more comprehensive strategy for automated testing of metrics. For example, we could have a test that runs through multiple scenarios, starting with installation, and at the end checks whether all the correct metrics events were fired.
Or we could add a capability to test to define expected metrics events for any e2e test
Might not be a good idea because then each test would have more than one concern