Data gathered and investigation results will be added in this ticket.
Scenario
No response
Design
No response
Technical Details
No response
Threat Modeling Framework
No response
Acceptance Criteria
No response
Stakeholder review needed before the work gets merged
[ ] Engineering (needed in most cases)
[ ] Design
[ ] Product
[ ] QA (automation tests are required to pass before merging PRs but not all changes are covered by automation tests - please review if QA is needed beyond automation tests)
Write a short explanation of the technical cause of the bug
Loading scripts was delayed by 10 milliseconds
Causing browser.runtime.onInstalled.addListener to be called at least 10 milliseconds after installation
The listener never received an install event and so its callback was never run
Were there any missing unit tests that could have preempted this issue?
I don’t think so, because the bug was dependent on the onInstalled event not being emitted, which is a browser behavior that we would not unit test
Were there any absent end-to-end (e2e) tests that could have averted this issue?
Yes, and those tests were added after the fix in commit 048f606b344
Were there any omitted manual tests that could have intercepted this issue?
This was caught by a manual test during RC testing, but we do not have a documented manual scenario.
However, we no longer need a documented manual scenario for this as we have automated coverage
But we should document some general guidance for manual testers regarding metrics. Perhaps we make it easy for manual testers to set up a test mixpanel, and verify the metrics reported on a subset of the manual scenarios they run through.
Were there any other elements lacking, such as typed code, comprehensive documentation, well-architected APIs, etc., that might have prevented this issue?
If the author or reviewer of the PR had the solution proposed in 6c, they could have caught this, but that would only happen if they did a comprehensive QA before merge, which we can’t expect for every PR.
Probably better documentation of the code in background.js could help
Are there any additional factors that you believe contributed to the occurrence of this bug?
We could benefit from a more comprehensive strategy for automated testing of metrics. For example, we could have a test that runs through multiple scenarios, starting with installation, and at the end checks whether all the correct metrics events were fired.
Or we could add a capability to test to define expected metrics events for any e2e test
Might not be a good idea because then each test would have more than one concern
What is this about?
Performing a root-cause analysis of this bug as part of this epic - https://github.com/MetaMask/MetaMask-planning/issues/2103.
Data gathered and investigation results will be added in this ticket.
Scenario
No response
Design
No response
Technical Details
No response
Threat Modeling Framework
No response
Acceptance Criteria
No response
Stakeholder review needed before the work gets merged
References
No response