MetaMask / metamask-extension

:globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
https://metamask.io
Other
11.81k stars 4.82k forks source link

[Bug]: Notifications - Some accounts are loading and get the error `Failed to check accounts presence Error: Exceeds maximum number of requests waiting to be resolved, please try again.` #25749

Closed seaona closed 1 month ago

seaona commented 1 month ago

Describe the bug

Whenever I enable the notifications, when I go to the notifications settings, I see how some of the accounts remain with the loading spinner forever, and I can see this error in the background console Failed to check accounts presence Error: Exceeds maximum number of requests waiting to be resolved, please try again.

Expected behavior

No response

Screenshots/Recordings

Screenshot from 2024-07-10 18-12-01

https://github.com/MetaMask/metamask-extension/assets/54408225/3706e6b4-ac8a-4d13-bc69-6cbba77a1110

Steps to reproduce

  1. Create several accounts
  2. Enable notifications
  3. Go to notification settings
  4. See last accounts are forever loading and there is the background error

Error messages or log output

No response

Version

12.0.0

Build type

None

Browser

Chrome

Operating system

Linux

Hardware wallet

No response

Additional context

No response

Severity

No response

sleepytanya commented 1 month ago

Confirming the same behavior on v12.1.0 although it doesn't happen every time, sometimes accounts are loaded without any issues:

https://github.com/MetaMask/metamask-extension/assets/104780023/9f2a2edf-5358-4ca8-b62e-f9ea6c051db8

Screenshot 2024-07-10 at 18 15 55
Prithpal-Sooriya commented 1 month ago

TY. Jira Ticket created here. Our team will investigate.

Prithpal-Sooriya commented 1 month ago

This is interesting. Seems to come from queued requests to the snap controller. Our team will better manage/cache responses back from the pre-installed snap we are using (we don't need to realistically call this every time)

sleepytanya commented 1 month ago

Present in v12.0.0 beta:

Screenshot 2024-07-15 at 21 40 49 Screenshot 2024-07-15 at 21 36 39
sleepytanya commented 1 month ago

I think it's the same bug because I see the same console messages when Notifications list appears empty:

https://github.com/user-attachments/assets/9659cbb4-bdc5-41a3-ba11-7b84376876c8

Screenshot 2024-07-16 at 23 52 56 Screenshot 2024-07-16 at 23 54 20
Prithpal-Sooriya commented 1 month ago

For visibility, it seems that we make quite a few calls to the preinstalled snap we are using, leading to a large request queue (and errors being thrown).

A fix is made in the core library (we are in process of migrating extension to using the core library) https://github.com/MetaMask/core/pull/4532

I can also backport PR to the main extension

benjisclowder commented 2 weeks ago

Collaborative Effort Required for Root Cause Analysis (RCA) on Critical Issues

We are quickly approaching the end of the quarter and we encourage you once again to take some moments and perform RCA on this critical issue. You may do so by answering the questions below:

**1. What PR fixed the issue?

  1. Can you pinpoint the commit from which the issue originated?
  2. Write a short explanation of the technical cause of the bug
  3. How could we have avoided merging this bug? What would have had to be different about our code, tests or processes?** 4.1. Were there any missing unit, e2e or manual tests that could have preempted this issue? 4.2. Were there any other elements lacking, such as typed code, comprehensive documentation, well-architected APIs, etc., that might have prevented this issue? 4.3. If your answer to a and b is no, then is there anything at all that you can think of that, if it had been different before this bug was introduced, would have prevented it from being merged?

Please provide your answers as a reply to this comment and tag me as well.

You can read more about the initiative here. Thank you!

Tagging eng. who added the fix: @Prithpal-Sooriya

Prithpal-Sooriya commented 2 weeks ago

@benjisclowder

  1. What PR fixed the issue?

https://github.com/MetaMask/metamask-extension/pull/25946

  1. Can you pinpoint the commit from which the issue originated?

This is hard to pinpoint, as this was a new feature, not an existing feature.

  1. Write a short explanation of the technical cause of the bug

Commit for the fix: https://github.com/MetaMask/metamask-extension/pull/25946/commits/5f76720a162031755ae1e8921f5c366f804b6574

When a user has many accounts, we were making many API calls. Also we were making many Snap requests. The snap requests especially were being queued and then failing. Reducing this to only make 1 call at the page level and caching resolved this issue.

  1. How could we have avoided merging this bug? What would have had to be different about our code, tests or processes?

We should have tested with multiple accounts (10+ addresses). We never hit this snap request issue during development.

4.1. Were there any missing unit, e2e or manual tests that could have preempted this issue?

Unit tests were covered, but the Snap Environment was mocked.

Integration tests (using the snap environment), or E2E may have spotted this - however it was a flakey issue.

More manual testing (at least during development) may have raised this early too.

4.2. Were there any other elements lacking, such as typed code, comprehensive documentation, well-architected APIs, etc., that might have prevented this issue?

Not particularly, most of the knowledge base around notifications were known. The only unknown was the request queue the Snap Controller contains.

4.3. If your answer to a and b is no, then is there anything at all that you can think of that, if it had been different before this bug was introduced, would have prevented it from being merged?

N/A