brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.61k stars 2.29k forks source link

Spotify OOM crash #40886

Open ShivanKaul opened 3 weeks ago

ShivanKaul commented 3 weeks ago

Description

See the following reports:

  1. https://github.com/brave/brave-browser/issues/36356
  2. https://github.com/brave/brave-browser/issues/40311
  3. https://community.brave.com/t/spotify-keeps-crashing-in-brave/532078/101

@atuchin-m suggested that this might be happening because of several requestIdleCallback trigged by reCaptcha JS on the spotify site: https://github.com/brave/brave-browser/issues/36356#issuecomment-2200193670

Doesn't look related to Shields or Playlist from reports (disabling either doesn't help).

Steps to reproduce

  1. Open the Web version of Spotify
  2. Play music
  3. Go to a different tab and let Spotify run

Actual result

The tab complains of high memory usage, and eventually crashes.

Expected result

Tab should not crash.

Reproduces how often

Easily reproduced

Brave version (brave://version info)

All

Channel information

Reproducibility

Miscellaneous information

No response

kjozwiak commented 1 week ago

So the above is still happening. I was listening to some music while working and it crashed after about ~2hrs. It was running in a tab and when I switched to it to pause the music, the WebView crashed. Using the following on Win 11 x64:

Brave | 1.72.12 Chromium: 129.0.6668.42 (Official Build) nightly (64-bit)
-- | --
Revision | 96e71bd286af7b7f34dd2fc749810eaf231c55a5
OS | Windows 11 Version 23H2 (Build 22631.4169)

Crashes:

Screenshot 2024-09-17 164819

[ 00 ] RaiseException
[ 01 ] partition_alloc::internal::OnNoMemoryInternal(unsigned __int64) ( oom.cc:41 )
[ 02 ] partition_alloc::TerminateBecauseOutOfMemory(unsigned __int64) ( oom.cc:64 )
[ 03 ] partition_alloc::internal::OnNoMemory(unsigned __int64) ( oom.cc:74 )
[ 04 ] partition_alloc::internal::PartitionExcessiveAllocationSize(unsigned __int64) ( partition_oom.cc:19 )
[ 05 ] partition_alloc::internal::`anonymous namespace'::PartitionDirectMap(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64,unsigned __int64) ( partition_bucket.cc:275 )
[ 06 ] partition_alloc::internal::PartitionBucket::AllocNewSlotSpan(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64) ( partition_bucket.cc:641 )
[ 07 ] partition_alloc::internal::PartitionBucket::SlowPathAlloc(partition_alloc::PartitionRoot *,partition_alloc::internal::AllocFlags,unsigned __int64,unsigned __int64,partition_alloc::internal::SlotSpanMetadata * *,bool *) ( partition_bucket.cc:1363 )
[ 08 ] partition_alloc::PartitionRoot::AllocFromBucket(partition_alloc::internal::PartitionBucket *,unsigned __int64,unsigned __int64,unsigned __int64 *,unsigned __int64 *,bool *) ( partition_root.h:1282 )
[ 09 ] partition_alloc::PartitionRoot::AllocInternalNoHooks(unsigned __int64,unsigned __int64) ( partition_root.h:2158 )
[ 10 ] allocator_shim::internal::PartitionMalloc(unsigned __int64,void *) ( allocator_shim_default_dispatch_to_partition_alloc.cc:204 )
[ 11 ] base::allocator::dispatcher::internal::DispatcherImpl<base::PoissonAllocationSampler>::AllocFn(unsigned __int64,void *) ( dispatcher_internal.h:129 )
[ 12 ] ShimMalloc(unsigned __int64,void *) ( shim_alloc_functions.h:112 )
[ 13 ] malloc(unsigned __int64) ( allocator_shim_override_ucrt_symbols_win.h:86 )
[ 14 ] _malloc_base(unsigned __int64) ( internal.cc:98 )
[ 15 ] operator new(unsigned __int64) ( new_scalar.cpp:36 )
[ 16 ] url::Origin::GetURL() ( origin.cc:159 )
[ 17 ] content_settings::`anonymous namespace'::GetOriginOrURL(blink::WebFrame const *) ( brave_content_settings_agent_impl.cc:58 )
[ 18 ] RtlUnwind
[ 19 ] RtlUnwind
[ 20 ] RtlUnwind
[ 21 ] blink::ScriptedIdleTaskController::ScheduleCallback(int,unsigned int) ( scripted_idle_task_controller.cc:123 )
[ 22 ] 0xaaaaaaaaaaaaaaaa
atuchin-m commented 1 week ago

We identified and upstreamed a memory hog fix to Chromium. In fact, the primary issue is in the site JS and fixing C++ hog doesn't help enough (but probably extended a lifetime a little). The issue is hard to debug and can't be reproduced on a local build because of DRM protection on spotify.

The real issue is ReCaptcha. Why?

  1. An idle Spotify tab registers a lot of idleCallback. It starts with a few one, but after 10-20 minutes we get >1000. That is the reason why it crashes the renderer. All of the requestIdleCallback calls are ReCaptcha-related. (screenshot 1 here)

  2. Here is the list of the origins with ScriptedIdleTaskController crashes during the last week. All of them uses ReCaptcha.

image
atuchin-m commented 1 week ago

I'm trying to make a simplified example to reproduce this issue. The steps:

  1. start a local https server
  2. make index.html with the following content. <api_key> should match to a test domain (i.e. open.spotify.com):
    <!doctype html>
    <html lang="en">
    <head>
        <script src="https://www.google.com/recaptcha/enterprise.js?render=<api_key>" async="" defer=""></script>
    </head>
    <body>
    </body>
    </html>
  3. Redirect the test domain to 127.0.0.1 via OS hosts file.
  4. Launch the browser with a clean profile (to bypass HSTS) and visit https://<test-domain>/index.html. Ignore SSL warnings.
  5. Start recoding js perf trace via devtools, leave the page in the background for 2 min.

Actual result: the devtools trace shows a bunch of idle callbacks in a row.

image

Expected result (from Chrome): only few idle callbacks in a row.

image
atuchin-m commented 1 week ago

Here is the screencast of the issue (Brave): https://github.com/user-attachments/assets/a00d4bb1-52d6-463f-af27-6f8c54b0ceeb

atuchin-m commented 6 days ago

https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php also reproduces the issue.

atuchin-m commented 6 days ago

--disable-features=BraveRoundTimeStamps resolves the issue. The reason is that the feature reduces the timer resolution. It breaks somethings in the script logic: it start to rescedule the callback again and again eating all the memory.

The feature was implemented here: https://github.com/brave/brave-core/pull/15309/

atuchin-m commented 6 days ago

The steps to verify:

Option 1.

  1. https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php in tab 1
  2. switch the tab, tab1 should be inactive during all steps
  3. Wait 30 sec.
  4. Measure the tab memory using the build-in task manager
  5. Wait 10 minutes
  6. Measure the memory again Expected result: <= 100 MB memory usage Actual result: > 100MB memory usage, the usage is slowly increasing.

Option 2

  1. Play music on spotify.com
  2. Switch the tab, putting it to the background
  3. Wait 20 minutes.
  4. Measure the tab memory using the build-in task manager

Expected result: <= 300 MB memory usage Actual result: > 500MB memory usage, the usage is slowly increasing.