In order to instantiate a functioning communication between the offscreen iframe for Ledger and the LedgerKeyring (through LedgerOffscreenBridge), we need to make sure that the iframe is loaded before sending any message to it.
We currently wait for the offscreen page to load, but the iframe load is completely async and it will most likely be ready after the rest of the offscreen page, leaving messages proxied to the iframe hanging forever.
On a higher level, this is dangerous because every time we try to send a message to the Ledger iframe the KeyringController controller-level mutex is locked, and any other operation will wait for its release to proceed - this creates a deadlock situation in the case the iframe does not respond to a message.
The issue becomes apparent when using the latest withKeyring method from KeyringController - and for this reason the linked PR was reverted.
By re-applying those changes, the issue can be seen when the unlock action happens fast enough to be done before the offscreen document is fully loaded. This can sometimes happen when running e2e tests locally:
Describe the bug
In order to instantiate a functioning communication between the offscreen iframe for Ledger and the LedgerKeyring (through LedgerOffscreenBridge), we need to make sure that the iframe is loaded before sending any message to it.
We currently wait for the offscreen page to load, but the iframe load is completely async and it will most likely be ready after the rest of the offscreen page, leaving messages proxied to the iframe hanging forever.
On a higher level, this is dangerous because every time we try to send a message to the Ledger iframe the KeyringController controller-level mutex is locked, and any other operation will wait for its release to proceed - this creates a deadlock situation in the case the iframe does not respond to a message.
Expected behavior
Setting the Ledger transport should never result in a
KeyringController
deadlock, to the point that we can reapply https://github.com/MetaMask/metamask-extension/pull/25435 safelyScreenshots/Recordings
No response
Steps to reproduce
The issue becomes apparent when using the latest
withKeyring
method from KeyringController - and for this reason the linked PR was reverted.By re-applying those changes, the issue can be seen when the unlock action happens fast enough to be done before the offscreen document is fully loaded. This can sometimes happen when running e2e tests locally:
yarn start:test
yarn user-actions-benchmark:chrome
Error messages or log output
No response
Detection stage
On the development branch
Version
The PR that made the issue evident was reverted, but the goal is to be able to reapply it to
develop
(https://github.com/MetaMask/metamask-extension/pull/25435)Build type
None
Browser
Chrome, Firefox
Operating system
Windows, MacOS, Linux
Hardware wallet
Ledger
Additional context
No response
Severity
No response