chromiumembedded / cef

Chromium Embedded Framework (CEF). A simple framework for embedding Chromium-based browsers in other applications.
https://bitbucket.org/chromiumembedded/cef/
Other
3.29k stars 457 forks source link

Crash in win::Sid::FromNamedCapability when opening a specific website. #3791

Open Mellnik opened 2 weeks ago

Mellnik commented 2 weeks ago

Describe the bug A crash happens when opening a specific website. https://webglsamples.org/aquarium/aquarium.html The website provides WebGL examples. The crash only happens with the specific example in the link above.

To Reproduce Steps to reproduce the behavior:

  1. Open cefclient.exe
  2. Go to https://webglsamples.org/aquarium/aquarium.html
  3. Crash happens.

Expected behavior No crash should happen. The website is a WebGL example.

Screenshots Screenshot 2024-09-20 211659

Versions (please complete the following information):

Additional context Reproducible with cefclient.exe and my embedded OSR application. Not reproducible with Google Chrome.

The crash seems to happen by trying to add the capability "kLpacChromeInstallFiles" to so called "app container profile".

Full call stack:

>   [Inline Frame] libcef.dll!sandbox::policy::`anonymous namespace'::SetupAppContainerProfile(sandbox::AppContainer * container, const base::CommandLine & command_line, sandbox::mojom::Sandbox sandbox_type) Line 480    C++
    libcef.dll!sandbox::policy::SandboxWin::AddAppContainerProfileToConfig(const base::CommandLine & command_line, sandbox::mojom::Sandbox sandbox_type, const std::__Cr::basic_string<char,std::__Cr::char_traits<char>,std::__Cr::allocator<char>> & appcontainer_id, sandbox::TargetConfig * config) Line 838    C++
    [Inline Frame] libcef.dll!content::`anonymous namespace'::XrCompositingInitializeConfig(sandbox::TargetConfig * config, base::CommandLine & cmd_line, sandbox::mojom::Sandbox sandbox_type) Line 192    C++
    libcef.dll!content::UtilitySandboxedProcessLauncherDelegate::InitializeConfig(sandbox::TargetConfig * config) Line 337  C++
    [Inline Frame] libcef.dll!sandbox::policy::`anonymous namespace'::GenerateConfigForSandboxedProcess(const base::CommandLine & cmd_line, sandbox::policy::SandboxDelegate * delegate, sandbox::TargetConfig * config) Line 674   C++
    libcef.dll!sandbox::policy::SandboxWin::GeneratePolicyForSandboxedProcess(const base::CommandLine & cmd_line, const std::__Cr::vector<void *,std::__Cr::allocator<void *>> & handles_to_inherit, sandbox::policy::SandboxDelegate * delegate, sandbox::TargetPolicy * policy) Line 959  C++
    libcef.dll!sandbox::policy::SandboxWin::StartSandboxedProcess(const base::CommandLine & cmd_line, const std::__Cr::vector<void *,std::__Cr::allocator<void *>> & handles_to_inherit, sandbox::policy::SandboxDelegate * delegate, base::Process * process) Line 1003    C++
    libcef.dll!content::StartSandboxedProcess(content::SandboxedProcessLauncherDelegate * delegate, const base::CommandLine & target_command_line, const std::__Cr::vector<void *,std::__Cr::allocator<void *>> & handles_to_inherit, base::Process * process) Line 47  C++
    libcef.dll!content::internal::ChildProcessLauncherHelper::LaunchProcessOnLauncherThread(const base::LaunchOptions * options, std::__Cr::unique_ptr<std::__Cr::vector<void *,std::__Cr::allocator<void *>>,std::__Cr::default_delete<std::__Cr::vector<void *,std::__Cr::allocator<void *>>>> files_to_register, bool * is_synchronous_launch, int * launch_result) Line 160 C++
    libcef.dll!content::internal::ChildProcessLauncherHelper::LaunchOnLauncherThread() Line 327 C++
    [Inline Frame] libcef.dll!base::OnceCallback<void ()>::Run() Line 156   C++
    libcef.dll!base::TaskAnnotator::RunTaskImpl(base::PendingTask & pending_task) Line 203  C++
    [Inline Frame] libcef.dll!base::TaskAnnotator::RunTask(perfetto::StaticString event_name, base::PendingTask & pending_task, base::internal::TaskTracker::RunTaskImpl::<lambda_0> && args) Line 90   C++
    [Inline Frame] libcef.dll!base::internal::TaskTracker::RunTaskImpl(base::internal::Task & task, const base::TaskTraits & traits, base::internal::TaskSource * task_source, const base::internal::SequenceToken & token) Line 679    C++
    libcef.dll!base::internal::TaskTracker::RunBlockShutdown(base::internal::Task & task, const base::TaskTraits & traits, base::internal::TaskSource * task_source, const base::internal::SequenceToken & token) Line 672  C++
    [Inline Frame] libcef.dll!base::internal::TaskTracker::RunTaskWithShutdownBehavior(base::internal::Task & task, const base::TaskTraits & traits, base::internal::TaskSource * task_source, const base::internal::SequenceToken & token) Line 697    C++
    [Inline Frame] libcef.dll!base::internal::TaskTracker::RunTask(base::internal::Task task, base::internal::TaskSource * task_source, const base::TaskTraits & traits) Line 521   C++
    libcef.dll!base::internal::TaskTracker::RunAndPopNextTask(base::internal::RegisteredTaskSource task_source) Line 416    C++
    libcef.dll!base::internal::WorkerThread::RunWorker() Line 436   C++
    libcef.dll!base::internal::WorkerThread::RunDedicatedWorker() Line 342  C++
    libcef.dll!base::`anonymous namespace'::ThreadFunc(void * params) Line 131  C++
magreenblatt commented 2 weeks ago

OS: Windows 11 23H2, 22631.4169

@Mellnik have you tested any other Windows versions?

Mellnik commented 2 weeks ago

@magreenblatt No, sorry. Which versions do you think would be worthwhile testing? I could setup a VM. This crash currently hinders me from upgrading so I will gladly do anything helping you fix it.

magreenblatt commented 1 week ago

I'm not able to reproduce this crash with cefclient 128.4.12+g1d7a1f9+chromium-128.0.6613.138 or 129.0.6+ga918aa7+chromium-129.0.6668.29 (64-bit builds). I have the same Windows 11 version.

magreenblatt commented 1 week ago

@Mellnik can you test if the crash reproduces for you on a different machine (different GPU/drivers)?

Mellnik commented 1 week ago

Alright I had it tested by a friend. He's running the same Windows version as me tho. (23H2, 22631.4169). NVIDIA driver version 552.22 with an RTX 3060 Laptop GPU.

My NVIDIA driver version: 561.09, GTX 1080 Ti

To confirm I've compiled a fresh 129.0.6+ga918aa7+chromium-129.0.6668.29 which results in the same crash.

I've also observed the same crash in another scenario last week. Unfortunately I don't have any more information on how to reproduce this other scenario.

From the stacktrace that I posted it looks like the pointer container is invalid. That is not the case, it actually crashes because of a NOTREACHED inside the function AddCapability as kLpacChromeInstallFiles is not found.

Do you have any more information on what to try? Thank you.

Edit: Using Visual Studio 2022 17.11.4 in case it matters.

magreenblatt commented 1 week ago

From the stacktrace that I posted it looks like the pointer container is invalid. That is not the case, it actually crashes because of a NOTREACHED inside the function AddCapability as kLpacChromeInstallFiles is not found.

Where in the code is this NOTREACHED that you're referring to?

NOTREACHED is fatal in Chromium starting with ~M127. See here for background.

Mellnik commented 6 days ago

Sorry for the delay. Here is more information about the NOTREACHED.

    !logging::LogMessage::HandleFatal(unsigned __int64 stack_start, const std::string & str_newline) Line 1073  C++
    [Inline Frame] !logging::LogMessage::Flush::<lambda_0>::operator()() Line 773   C++
    [Inline Frame] !absl::cleanup_internal::Storage<`lambda at ..\..\base\logging.cc:771:40'>::InvokeCallback() Line 87 C++
    [Inline Frame] !absl::Cleanup<absl::cleanup_internal::Tag,`lambda at ..\..\base\logging.cc:771:40'>::~Cleanup() Line 106    C++
    !logging::LogMessage::Flush() Line 956  C++
    !logging::LogMessage::~LogMessage() Line 728    C++
    [Inline Frame] !logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 148  C++
    !logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 143 C++
    !logging::NotReachedNoreturnError::~NotReachedNoreturnError() Line 396  C++
    !base::win::Sid::FromNamedCapability(const std::wstring & capability_name) Line 151 C++
>   !sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355 C++
    [External Code] 

notreached

So it looks like kLpacChromeInstallFiles is an unknown capability. If NOTREACHED was not fatal before M127 it means this error also happened for me before but without my attention. Do you happen to know anything about this kLpacChromeInstallFiles? Not really sure why this is triggered for me but not for you.

magreenblatt commented 6 days ago

Thanks for the updated details. I'm not sure why this code path is triggering for you. However, the reason for this NOTREACHED is that we've explicitly removed boringssl dependencies from cef_sandbox.lib (the !BUILDFLAG(IS_CEF_SANDBOX_BUILD) part).

magreenblatt commented 6 days ago

@Mellnik Can you post the full call stack for this NOTREACHED? What module is this code running in? Presumably it's not running in libcef.dll or BUILDFLAG(IS_CEF_SANDBOX_BUILD) would be false.

Mellnik commented 6 days ago

The call stack from the original post is from cefclient.exe. The one I posted today is from my application that embeds CEF with OSR. For some reason they are different. In cefclient.exe it doesn't show the NOTREACHED location. You are right, the module of todays call stack is not libcef.dll, it's the module of my application that links against cef_sandbox.lib, if I am not mistaken.

Mellnik commented 5 days ago

Update: I've just downloaded 129.0.6+ga918aa7+chromium-129.0.6668.29 from https://cef-builds.spotifycdn.com/index.html, created the VS solution with CMake, built it and started it with the VS debugger. Here is a call stack that contains the NOTREACHED as well.

    cefclient.exe!logging::LogMessage::HandleFatal(unsigned __int64 stack_start, const std::string & str_newline) Line 1073 C++
    cefclient.exe!logging::LogMessage::Flush::<lambda_0>::operator()() Line 775 C++
    cefclient.exe!absl::cleanup_internal::Storage<`lambda at ..\..\base\logging.cc:771:40'>::InvokeCallback() Line 88   C++
    cefclient.exe!absl::Cleanup<absl::cleanup_internal::Tag,`lambda at ..\..\base\logging.cc:771:40'>::~Cleanup() Line 106  C++
    cefclient.exe!logging::LogMessage::Flush() Line 956 C++
    cefclient.exe!logging::LogMessage::~LogMessage() Line 727   C++
    cefclient.exe!logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 132    C++
    cefclient.exe!logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 127    C++
    [External Code] 
    cefclient.exe!logging::NotReachedNoreturnError::~NotReachedNoreturnError() Line 380 C++
>   cefclient.exe!base::win::Sid::FromNamedCapability(const std::wstring & capability_name) Line 151    C++
    cefclient.exe!sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355    C++
    [External Code] 

You said boringssl was removed. I wonder how the capabilities, that are not well known capabilities, are supposed to be found if the code part is omitted. (The #if !BUILDFLAG(IS_CEF_SANDBOX_BUILD) part.)

I also wonder if anybody else is able to reproduce this.

amaitland commented 4 days ago

Testing with cef_binary_129.0.11+g57354b8+chromium-129.0.6668.90_windows64_client,

cefclient.exe
cefclient.exe --off-screen-rendering-enabled
cefclient.exe --multi-threaded-message-loop --off-screen-rendering-enabled --enable-gpu
cefclient.exe --multi-threaded-message-loop --use-alloy-style

Loaded https://webglsamples.org/aquarium/aquarium.html using the different command line options.

The webgl demo loaded, ran for a while without issues.

magreenblatt commented 4 days ago

@Mellnik Does the crash reproduce for you when using the pre-built "Sample Application" from https://cef-builds.spotifycdn.com/index.html ?

Mellnik commented 4 days ago

@amaitland Thank you for your testing.

@magreenblatt No, just tested and it does not crash with the pre-built binaries. I suspect that something goes wrong when I build it myself. I will try to fully reinstall VS and see if that helps. Let me know if I should try something else too. Thank you!

Mellnik commented 2 days ago

Just an update. After reinstalling Visual Studio and everything related to it, it still does not work. I've tried Windows 11 SDK 10.0.22621 and 10.0.26100.

Will continue trying to find a solution. Gonna try it with a new Windows installation in a VM next and then further dig into the source code....

magreenblatt commented 2 days ago

This appears to be a linker issue (either VS bug or problem with cef_sandbox.lib). I can reproduce with VS 17.9.2 on Windows 10 using the 129.0.11 x64 standard distribution. This is a Radeon Pro Vega 16 GPU.

we've explicitly removed boringssl dependencies from cef_sandbox.lib (the !BUILDFLAG(IS_CEF_SANDBOX_BUILD) part).

This wouldn't be an issue with the "Sample Application" which is built as part of the Chromium build.

cefclient.exe!sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355 C++ [External Code]

The odd part is "External Code" calling into AddCapability. This might just be optimization compiling out the callers, but it's the reason I suspect a linker issue.

magreenblatt commented 2 days ago

The BoringSSL dependency changes were made in M110/M111 timeframe, so I wouldn't expect those to start causing problems in M128.

Mellnik commented 2 days ago

Thank you for testing and reproducing the issue yourself. To clarify the version it first occurs, I've narrowed it down to:

Screenshot 2024-10-07 205011

Last version without crash: 07/18/2024 - 126.2.18+g3647d39+chromium-126.0.6478.183 / Chromium 126.0.6478.183

First version where it crashes: 07/19/2024 - 127.0.20+g1567840+chromium-127.0.6533.57 / Chromium 127.0.6533.57

Later stable version of M127, it crashes too. 08/15/2024 - 127.3.5+g114ea2a+chromium-127.0.6533.120 / Chromium 127.0.6533.120

Maybe that helps finding what changed between those versions causing this.

Regarding the BoringSSL dependency, it makes sense thinking that it's not because of it. Because in fact M126 and earlier works fine. BUT you mentioned NOTREACHED was made fatal starting with M127, so it could still be BoringSSL and just went unnoticed until M127?

magreenblatt commented 2 days ago

BUT you mentioned NOTREACHED was made fatal starting with M127, so it could still be BoringSSL and just went unnoticed until M127?

Yes, good point.

Mellnik commented 9 hours ago

I've modified the source code. If it's a sandbox build I am now calculating the SHA256 with a different header only library. The results are printed to a file to ensure the hash is correct and FromSubAuthorities returns a valid result. Now it works without a crash and from the log file I can see LPACCHROMEINSTALLFILES, REGISTRYREAD and more being requested in the cef_sandbox and now fully resolve without a crash or hang in Release build.

I don't have a deep knowledge of CEF/Chromium like you but I think it would be good to investigate why this code path is sometimes triggered and for other users not. Also investigating if the removal of BoringSSL introduced similar cases in other parts of the code.

If I can help with anything please let me know. I can also send you the source changes if you like so you can test it yourself. Thanks!