brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.83k stars 2.33k forks source link

macOS Catalina users crashing during serverTrust check #39101

Open bsclifton opened 4 months ago

bsclifton commented 4 months ago

Description

User reports:

After the latest update Version 1.66.110 Chromium: 125.0.6422.60 (Official Build) (x86_64) I’ve been experiencing random crashes while watching for instance Netflix. The browser just shutting down on my MacBook Pro 10.15.7

Another user reports (not sure if related):

I have endlessly crashes with the latest version [Version 1.66.118 Chromium: 125.0.6422.147 (Offizieller Build) (x86_64)] on my mac, latest macos version. my latest crash report 163c1d00-c07b-0d0c-0000-000000000000 is only the latest of many today. seems that there are different triggers existing, somehow bound to

  • how many tabs are open
  • download links crash immediately
  • it gets more stable if I close some tabs before opening others

I had apple tecs on different levels already - no idea. I hope that somebody can tell me from the crash report what happens - and what I can do to get the problem solved.

NB: if I open amazon.de 2 brave crashes immediately. this is NOT the case for other amazon websites.

Thread on community: https://community.brave.com/t/crashes-with-latest-update/549560/7

Crash reports

21 May 2024

03431700-ce01-fb0b-0000-000000000000

27 May 2024

68091200-293f-040c-0000-000000000000 a27b1000-293f-040c-0000-000000000000

3 June 2024

e50a1400-c07b-0d0c-0000-000000000000 906c1200-c07b-0d0c-0000-000000000000 c1fc1100-c07b-0d0c-0000-000000000000

10 June 2024

46fe1000-f3b3-160c-0000-000000000000 747e1500-f3b3-160c-0000-000000000000

13 June 2024

051b1900-f3b3-160c-0000-000000000000 19c80200-37ee-1f0c-0000-000000000000 813f0400-37ee-1f0c-0000-000000000000

17 June 2024

7fa51000-37ee-1f0c-0000-000000000000 c32f0c00-37ee-1f0c-0000-000000000000 c69a0b00-37ee-1f0c-0000-000000000000

bsclifton commented 4 months ago

When I pulled up one of the crashes, here is what I saw: https://brave.sp.backtrace.io/p/brave/debug?filters=JTVCJTVCJTIyX3J4aWQlMjIlMkMlMjJlcXVhbCUyMiUyQyUyMmU1MGExNDAwLWMwN2ItMGQwYy0wMDAwLTAwMDAwMDAwMDAwMCUyMiU1RCU1RA%3D%3D&fingerprint=299d76a1eefda5bc7a234fd2563e6c6891eb7e572b698dbf6f86bf8ca3850dcc&debug=(%227849be4%22,0,0)

[ 00 ] ne_filter_stats_toggle
[ 01 ] ne_filter_protocol_remove_input_handler
[ 02 ] nw_protocol_boringssl_remove_input_handler
[ 03 ] CFURLProtectionSpaceGetServerTrust
[ 04 ] nw_endpoint_flow_failed
[ 05 ] _dispatch_call_block_and_release
[ 06 ] _dispatch_client_callout
[ 07 ] _dispatch_workloop_invoke
[ 08 ] _dispatch_workloop_worker_thread
[ 09 ] _pthread_wqthread
[ 10 ] start_wqthread
[ 11 ] 0x70000652eb70
[ 12 ] start_wqthread
[ 13 ] base::allocator::dispatcher::internal::DispatcherImpl<base::PoissonAllocationSampler>::AllocZeroInitializedFn(allocator_shim::AllocatorDispatch const*, unsigned long, unsigned long, void*) ( dispatcher_internal.h:153 )
[ 14 ] base::allocator::dispatcher::internal::DispatcherImpl<base::PoissonAllocationSampler>::AllocFn(allocator_shim::AllocatorDispatch const*, unsigned long, void*) ( dispatcher_internal.h:131 )
[ 15 ] base::allocator::dispatcher::internal::DispatcherImpl<base::PoissonAllocationSampler>::AllocFn(allocator_shim::AllocatorDispatch const*, unsigned long, void*) ( dispatcher_internal.h:131 )
[ 16 ] malloc_zone_calloc
[ 17 ] calloc
[ 18 ] _dispatch_kq_poll
[ 19 ] ShimMalloc ( shim_alloc_functions.h:107 )
[ 20 ] allocator_shim::(anonymous namespace)::MallocZoneMalloc(_malloc_zone_t*, unsigned long) ( allocator_shim_override_apple_default_zone.h:145 )

...more in backtrace...
bsclifton commented 4 months ago

cc: @iefremov

iefremov commented 4 months ago

it looks like a problem with installation... CFURLProtectionSpaceGetServerTrust looks like Macos is killing the browser have they tried reinstalling? How many complains do we have?

atuchin-m commented 4 months ago

@bsclifton I suspect the crash is connected with https://github.com/brave/brave-browser/issues/29406.

  1. the crash happens on macOS system thread and related to NetworkExtension and CFNetwork os libs. 2.CFURLProtectionSpaceGetServerTrust is a low-level representation of https://developer.apple.com/documentation/foundation/urlprotectionspace/1409926-servertrust. This API check the validity of SSL connection.
  2. It means that browser process makes a SSL connection directly (instead of using network-service). It could be 3rd-party code or os-level code.
  3. NetworkExtension.h is used in ikev2_connection_api_impl_mac.mm. It also specify probeUrl that is passed to API and will be reached by the os lib.
  4. the crashes started from v1.65 that perfectly matches to the suspected PR. Crashes started from v1.66.
  5. Only old macOS 10.15.7.* Catalina are affected&). The newer versions are ok.

In conclusion, I looks like macOS 10.15.7 bug that is triggered by the VPN-on-demand feature. some browser change (probably cr126 update). It's not a browser fault, but we have to ship the workaround, because the crash rate is unaffordable. The easy way is to disable the feature for <=10.15.7 macOS.

UPD: VPN-on-demand were shipped to v1.65.x, but we don't have a single crash from it. So it's definitely not the reason.

iefremov commented 4 months ago

we have a pretty lot of crashes in a last month https://share.backtrace.io/api/share/I2CMpTx3MkYQx1VLL4Q1HvU2

atuchin-m commented 4 months ago

Also only Intel&) mac devices are affected.

The first crash happened on https://github.com/brave/brave-browser/releases/v1.67.70, a day after cr125 is merged. There is an old report https://issues.chromium.org/issues/40834734, that mentioned Brave, 3rd-party firewalls (LuLu) and browser update check (from about page). The update check is also a candidate to be the request triggered the issue.

atuchin-m commented 4 months ago

The MacOS bug is probably https://nvd.nist.gov/vuln/detail/CVE-2020-9996, fixed in macOS Big Sur 11.0.1

atuchin-m commented 4 months ago

I've checked a few raw crash dumps. Most of them are referred to the updater (Sparkle) @mherrmann have we changed anything in the mac updater in v1.65.x?

1:Screenshot 2024-07-05 at 2 07 36 AM 2:Screenshot 2024-07-05 at 2 06 11 AM 3:Screenshot 2024-07-05 at 2 08 04 AM

mherrmann commented 4 months ago

@atuchin-m we have not touched Sparkle in a long time.

bsclifton commented 4 months ago

@atuchin-m @iefremov ~I believe Chromium 125 and 126 had changes where~ Chromium 122 had deleted the keystone implementation. We had to keep this - so we had pulled those patches in. It's possible there's a problem with how that code that we kept gets called.

@cdesouza-chromium and @emerick (and maybe @mkarolin) may know more

Here's an example commit (from a more recent Chromium upgrade): https://github.com/brave/brave-core/pull/23233/commits/632107206cba2471818604c3b576bfbe019aa713 (from https://github.com/brave/brave-core/pull/23233)

atuchin-m commented 4 months ago

Thanks @bsclifton I suppose it's https://github.com/brave/brave-browser/issues/35893

emerick commented 4 months ago

The Keystone implementation was removed from upstream when we upgraded to cr122, so we pulled it into our code base at that time pretty much as-is. We did some subsequent follow-up work in https://github.com/brave/brave-browser/issues/35893 to remove check_includes = false when building those files. Neither of these were intended to change any functionality (they just moved files around, really), though anything is possible.

In cr125, upstream migrated the infobar delegate from Objective-C to C++, which is what https://github.com/brave/brave-core/commit/632107206cba2471818604c3b576bfbe019aa713 is about. We kept it as Objective-C to avoid any risky changes to the upgrade flow and since we're only using the Keystone-related code to hook into Sparkle.

I looked through the commits, but nothing is really leaping out at me as far as causing a crash.

bsclifton commented 3 months ago

Thanks, @emerick! My bad, it was cr122 😄 Updated above comment

bsclifton commented 2 months ago

from @atuchin-m:

In conclusion, I looks like macOS 10.15.7 bug that is triggered by some browser change (probably cr126 update). It's not a browser fault, but we have to ship the workaround, because the crash rate is unaffordable.

I think you narrowed it down great. Basically, we see the same report from users on Catalina: https://community.brave.com/t/brave-keeps-crashing-and-is-getting-worse/557686

I don't think Chrome/Chromium currently has any restriction in place currently for Catalina. But Chromium 129 (~September 17th) will be dropping support for Catalina officially.

Notice the original report mentions visiting a specific website (amazon.de) causing a crash. @atuchin-m can there be specific SSL/TLS properties that the boringssl client is crashing when parsing for the site? I'll try to ask for more information.