Closed Mellnik closed 1 month ago
OS: Windows 11 23H2, 22631.4169
@Mellnik have you tested any other Windows versions?
@magreenblatt No, sorry. Which versions do you think would be worthwhile testing? I could setup a VM. This crash currently hinders me from upgrading so I will gladly do anything helping you fix it.
I'm not able to reproduce this crash with cefclient 128.4.12+g1d7a1f9+chromium-128.0.6613.138 or 129.0.6+ga918aa7+chromium-129.0.6668.29 (64-bit builds). I have the same Windows 11 version.
@Mellnik can you test if the crash reproduces for you on a different machine (different GPU/drivers)?
Alright I had it tested by a friend. He's running the same Windows version as me tho. (23H2, 22631.4169). NVIDIA driver version 552.22 with an RTX 3060 Laptop GPU.
My NVIDIA driver version: 561.09, GTX 1080 Ti
To confirm I've compiled a fresh 129.0.6+ga918aa7+chromium-129.0.6668.29 which results in the same crash.
I've also observed the same crash in another scenario last week. Unfortunately I don't have any more information on how to reproduce this other scenario.
From the stacktrace that I posted it looks like the pointer container
is invalid. That is not the case, it actually crashes because of a NOTREACHED inside the function AddCapability
as kLpacChromeInstallFiles
is not found.
Do you have any more information on what to try? Thank you.
Edit: Using Visual Studio 2022 17.11.4 in case it matters.
From the stacktrace that I posted it looks like the pointer container is invalid. That is not the case, it actually crashes because of a NOTREACHED inside the function AddCapability as kLpacChromeInstallFiles is not found.
Where in the code is this NOTREACHED that you're referring to?
NOTREACHED is fatal in Chromium starting with ~M127. See here for background.
Sorry for the delay. Here is more information about the NOTREACHED.
!logging::LogMessage::HandleFatal(unsigned __int64 stack_start, const std::string & str_newline) Line 1073 C++
[Inline Frame] !logging::LogMessage::Flush::<lambda_0>::operator()() Line 773 C++
[Inline Frame] !absl::cleanup_internal::Storage<`lambda at ..\..\base\logging.cc:771:40'>::InvokeCallback() Line 87 C++
[Inline Frame] !absl::Cleanup<absl::cleanup_internal::Tag,`lambda at ..\..\base\logging.cc:771:40'>::~Cleanup() Line 106 C++
!logging::LogMessage::Flush() Line 956 C++
!logging::LogMessage::~LogMessage() Line 728 C++
[Inline Frame] !logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 148 C++
!logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 143 C++
!logging::NotReachedNoreturnError::~NotReachedNoreturnError() Line 396 C++
!base::win::Sid::FromNamedCapability(const std::wstring & capability_name) Line 151 C++
> !sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355 C++
[External Code]
So it looks like kLpacChromeInstallFiles
is an unknown capability.
If NOTREACHED was not fatal before M127 it means this error also happened for me before but without my attention.
Do you happen to know anything about this kLpacChromeInstallFiles
? Not really sure why this is triggered for me but not for you.
Thanks for the updated details. I'm not sure why this code path is triggering for you. However, the reason for this NOTREACHED
is that we've explicitly removed boringssl dependencies from cef_sandbox.lib (the !BUILDFLAG(IS_CEF_SANDBOX_BUILD)
part).
@Mellnik Can you post the full call stack for this NOTREACHED? What module is this code running in? Presumably it's not running in libcef.dll or BUILDFLAG(IS_CEF_SANDBOX_BUILD)
would be false.
The call stack from the original post is from cefclient.exe. The one I posted today is from my application that embeds CEF with OSR. For some reason they are different. In cefclient.exe it doesn't show the NOTREACHED location. You are right, the module of todays call stack is not libcef.dll, it's the module of my application that links against cef_sandbox.lib, if I am not mistaken.
Update: I've just downloaded 129.0.6+ga918aa7+chromium-129.0.6668.29
from https://cef-builds.spotifycdn.com/index.html, created the VS solution with CMake, built it and started it with the VS debugger.
Here is a call stack that contains the NOTREACHED as well.
cefclient.exe!logging::LogMessage::HandleFatal(unsigned __int64 stack_start, const std::string & str_newline) Line 1073 C++
cefclient.exe!logging::LogMessage::Flush::<lambda_0>::operator()() Line 775 C++
cefclient.exe!absl::cleanup_internal::Storage<`lambda at ..\..\base\logging.cc:771:40'>::InvokeCallback() Line 88 C++
cefclient.exe!absl::Cleanup<absl::cleanup_internal::Tag,`lambda at ..\..\base\logging.cc:771:40'>::~Cleanup() Line 106 C++
cefclient.exe!logging::LogMessage::Flush() Line 956 C++
cefclient.exe!logging::LogMessage::~LogMessage() Line 727 C++
cefclient.exe!logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 132 C++
cefclient.exe!logging::`anonymous namespace'::NotReachedLogMessage::~NotReachedLogMessage() Line 127 C++
[External Code]
cefclient.exe!logging::NotReachedNoreturnError::~NotReachedNoreturnError() Line 380 C++
> cefclient.exe!base::win::Sid::FromNamedCapability(const std::wstring & capability_name) Line 151 C++
cefclient.exe!sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355 C++
[External Code]
You said boringssl was removed. I wonder how the capabilities, that are not well known capabilities, are supposed to be found if the code part is omitted. (The #if !BUILDFLAG(IS_CEF_SANDBOX_BUILD)
part.)
I also wonder if anybody else is able to reproduce this.
Testing with cef_binary_129.0.11+g57354b8+chromium-129.0.6668.90_windows64_client
,
cefclient.exe
cefclient.exe --off-screen-rendering-enabled
cefclient.exe --multi-threaded-message-loop --off-screen-rendering-enabled --enable-gpu
cefclient.exe --multi-threaded-message-loop --use-alloy-style
Loaded https://webglsamples.org/aquarium/aquarium.html using the different command line options.
The webgl demo loaded, ran for a while without issues.
@Mellnik Does the crash reproduce for you when using the pre-built "Sample Application" from https://cef-builds.spotifycdn.com/index.html ?
@amaitland Thank you for your testing.
@magreenblatt No, just tested and it does not crash with the pre-built binaries. I suspect that something goes wrong when I build it myself. I will try to fully reinstall VS and see if that helps. Let me know if I should try something else too. Thank you!
Just an update. After reinstalling Visual Studio and everything related to it, it still does not work. I've tried Windows 11 SDK 10.0.22621 and 10.0.26100.
Will continue trying to find a solution. Gonna try it with a new Windows installation in a VM next and then further dig into the source code....
This appears to be a linker issue (either VS bug or problem with cef_sandbox.lib). I can reproduce with VS 17.9.2 on Windows 10 using the 129.0.11 x64 standard distribution. This is a Radeon Pro Vega 16 GPU.
we've explicitly removed boringssl dependencies from cef_sandbox.lib (the !BUILDFLAG(IS_CEF_SANDBOX_BUILD) part).
This wouldn't be an issue with the "Sample Application" which is built as part of the Chromium build.
cefclient.exe!sandbox::AppContainerBase::AddCapability(const wchar_t * capability_name) Line 355 C++ [External Code]
The odd part is "External Code" calling into AddCapability. This might just be optimization compiling out the callers, but it's the reason I suspect a linker issue.
The BoringSSL dependency changes were made in M110/M111 timeframe, so I wouldn't expect those to start causing problems in M128.
Thank you for testing and reproducing the issue yourself. To clarify the version it first occurs, I've narrowed it down to:
Last version without crash:
07/18/2024 - 126.2.18+g3647d39+chromium-126.0.6478.183 / Chromium 126.0.6478.183
First version where it crashes:
07/19/2024 - 127.0.20+g1567840+chromium-127.0.6533.57 / Chromium 127.0.6533.57
Later stable version of M127, it crashes too.
08/15/2024 - 127.3.5+g114ea2a+chromium-127.0.6533.120 / Chromium 127.0.6533.120
Maybe that helps finding what changed between those versions causing this.
Regarding the BoringSSL dependency, it makes sense thinking that it's not because of it. Because in fact M126 and earlier works fine. BUT you mentioned NOTREACHED was made fatal starting with M127, so it could still be BoringSSL and just went unnoticed until M127?
BUT you mentioned NOTREACHED was made fatal starting with M127, so it could still be BoringSSL and just went unnoticed until M127?
Yes, good point.
I've modified the source code. If it's a sandbox build I am now calculating the SHA256 with a different header only library.
The results are printed to a file to ensure the hash is correct and FromSubAuthorities
returns a valid result.
Now it works without a crash and from the log file I can see LPACCHROMEINSTALLFILES
, REGISTRYREAD
and more being requested in the cef_sandbox and now fully resolve without a crash or hang in Release build.
I don't have a deep knowledge of CEF/Chromium like you but I think it would be good to investigate why this code path is sometimes triggered and for other users not. Also investigating if the removal of BoringSSL introduced similar cases in other parts of the code.
If I can help with anything please let me know. I can also send you the source changes if you like so you can test it yourself. Thanks!
As background, the lpacChromeInstallFiles
capability appears related to https://issues.chromium.org/issues/40560515 which adds additional sandboxing for the GPU process (specific details here). I'm not sure what (if anything) breaks when this capability isn't supported by cef_sandbox, but it's probably safest to just call a SHA256 implementation in Sid::FromNamedCapability. We don't want to add any new dependencies here, and Windows CryptoAPI seems quite complicated for this purpose. We can probably just export a new cef_compute_sha256
function from libcef.dll that uses the default BoringSSL implementation internally.
We can probably just export a new
cef_compute_sha256
function from libcef.dll that uses the default BoringSSL implementation internally.
We would need to access this symbol via LoadLibrary to avoid any build/link-time dependencies on libcef.
As for Windows CryptoAPI, here is an example for a working drop-in replacement for the original BoringSSL function, thanks to ChatGPT ^^
#define SHA256_DIGEST_LENGTH 32
bool SHA256(const uint8_t* InData, size_t InDataLen, uint8_t OutHash[SHA256_DIGEST_LENGTH])
{
HCRYPTPROV hProv = 0;
HCRYPTHASH hHash = 0;
if (!CryptAcquireContext(&hProv, nullptr, nullptr, PROV_RSA_AES, CRYPT_VERIFYCONTEXT))
{
return false;
}
if (!CryptCreateHash(hProv, CALG_SHA_256, 0, 0, &hHash))
{
CryptReleaseContext(hProv, 0);
return false;
}
if (!CryptHashData(hHash, InData, static_cast<DWORD>(InDataLen), 0))
{
CryptDestroyHash(hHash);
CryptReleaseContext(hProv, 0);
return false;
}
DWORD dwHashLen = SHA256_DIGEST_LENGTH;
if (!CryptGetHashParam(hHash, HP_HASHVAL, OutHash, &dwHashLen, 0))
{
CryptDestroyHash(hHash);
CryptReleaseContext(hProv, 0);
return false;
}
CryptDestroyHash(hHash);
CryptReleaseContext(hProv, 0);
return true;
}
As for Windows CryptoAPI, here is an example for a working drop-in replacement ... thanks to ChatGPT
Nice. Any idea where this code is from originally, so we can verify license compatibility?
EDIT: Some people suggest that ChatGPT-generated code is public domain. A quick internet search didn't find any substantially similar code in existing publicly discoverable projects. We'll probably go with public domain unless there is evidence to suggest otherwise.
Looks like the Crypto API methods are deprecated and we should use CNG instead. There's a SHA256 example at https://learn.microsoft.com/en-us/windows/win32/seccng/creating-a-hash-with-cng
We'll stick with Crypto API for now, as it's part of advapi32 (not a new link dependency). Attached is a small verification program for ChatGPT's implementation (rename to .cpp).
Describe the bug A crash happens when opening a specific website. https://webglsamples.org/aquarium/aquarium.html The website provides WebGL examples. The crash only happens with the specific example in the link above.
To Reproduce Steps to reproduce the behavior:
Expected behavior No crash should happen. The website is a WebGL example.
Screenshots
Versions (please complete the following information):
Additional context Reproducible with cefclient.exe and my embedded OSR application. Not reproducible with Google Chrome.
The crash seems to happen by trying to add the capability "kLpacChromeInstallFiles" to so called "app container profile".
Full call stack: