chromiumembedded / cef

Chromium Embedded Framework (CEF). A simple framework for embedding Chromium-based browsers in other applications.
https://bitbucket.org/chromiumembedded/cef/
Other
3.09k stars 450 forks source link

cefclient crash navigating to web.dev #3717

Closed salvadordf closed 1 week ago

salvadordf commented 2 weeks ago

Describe the bug cefclient crashes when you navigate back from web.dev. Sometimes it crashes as soon as you navigate to the web.dev website.

To Reproduce Steps to reproduce the behavior:

  1. Run cefclient
  2. Navigate to https://web.dev/patterns/files/open-one-or-multiple-files?hl=es-419
  3. Click the "back" button to return to google.com
  4. cefclient crashes

Sometimes cefclient crashes as soon as it loads the web.dev link and sometimes you have to click "back" and "forward" several times.

Expected behavior cefclient shouldn't crash when it navigates to any website.

Versions (please complete the following information):

No relevant information in debug.log.

Here's the callstack of the crash: callstack.txt

bjdupuis commented 1 week ago

This looks to be another dangling pointer crash like https://github.com/chromiumembedded/cef/issues/3239.

magreenblatt commented 1 week ago

Yes, likely the same one we tried to fix in 47798d3dbf.

magreenblatt commented 1 week ago

@salvadordf Please test M126 with the above fix and let me know if the issue still reproduces for you.

bjdupuis commented 1 week ago

REMOVED.

EDIT: I was wrong.

salvadordf commented 1 week ago

I'm sorry but I still get the crash in two different computers running https://cef-builds.spotifycdn.com/cef_binary_126.2.4%2Bg5e718e0%2Bchromium-126.0.6478.62_windows64_client.tar.bz2

PC 1: AMD Ryzen 7 3700X 8-Core Processor 3.59 GHz 24GB RAM with Windows 10 Pro 19045.4529 (Spanish) PC 2: Intel(R) Core(TM) i7-9700F CPU @ 3.00GHz 3.00 GHz 64GB RAM with Windows 11 Pro 22631.3737 (Spanish)

It takes more time to reproduce in the second PC. The callstacks are almost identical (see attachments)

Sometimes I had to go back and forward several times. Sometimes I had to reload or paste the web.dev address again instead of pressing forward.

I used WinDbg and crashing cefclient while using WinDbg takes a lot fewer navigations.

callstack_AMD.txt callstack_Intel.txt

bjdupuis commented 1 week ago

I guess I should have noted that my attempts were on Mac OS.

EDIT: And another attempt caused it to crash.

magreenblatt commented 1 week ago

Thanks for the updates.

magreenblatt commented 1 week ago

It may take some time to fix this issue. In the mean time, you should be able to disable the check by passing --disable-features=PartitionAllocDanglingPtr

magreenblatt commented 1 week ago

I believe the problem is this line (and maybe other places as well).

  auto done_cookie_callback = base::BindOnce(
        &InterceptedRequestHandlerWrapper::ContinueWithLoadedCookies,
        weak_ptr_factory_.GetWeakPtr(), request_id, request,
        std::move(callback));

The |request| parameter is a pointer, and via some template magic Chromium is wrapping it in a type that checks if the pointer is dangling.

base::WeakPtr<net_service::(anonymous namespace)::InterceptedRequestHandlerWrapper>,
int,
base::internal::UnretainedWrapper<network::ResourceRequest,base::unretained_traits::MayNotDangle,0>,
base::OnceCallback<void ()>

As a short-term solution we can add explicit use of base::UnsafeDanglingUntriaged when binding these pointers. Longer term, we should refactor this code to remove the dangling pointers all-together.

salvadordf commented 1 week ago

I tested https://cef-builds.spotifycdn.com/cef_binary_126.2.6%2Bgc7c4ac9%2Bchromium-126.0.6478.115_windows64_client.tar.bz2 with the --disable-features=PartitionAllocDanglingPtr switch and it took me more time but it crashed when pressing the back button.

This time I had to navigate back and forth many more times. Before the crash cefclient was showing the web,dev page, I waited 3 seconds, pressed back and it crashed.

callstack_cef_126.2.6.txt

magreenblatt commented 1 week ago

@salvadordf can you try --disable-features=PartitionAllocUnretainedDanglingPtr? Thanks.

salvadordf commented 1 week ago

@magreenblatt This time it took a lot more time to reproduce the crash and the callstack is different. callstack_switch2.txt

magreenblatt commented 1 week ago

As a short-term solution we can add explicit use of base::UnsafeDanglingUntriaged

Looks like that did the trick:

base::WeakPtr<net_service::(anonymous namespace)::InterceptedRequestHandlerWrapper>,
int,
base::internal::UnretainedWrapper<network::ResourceRequest,base::unretained_traits::MayDangleUntriaged,0>,
base::OnceCallback<void ()>
magreenblatt commented 1 week ago

@salvadordf Thanks, that looks like a new (unrelated) crash.

magreenblatt commented 1 week ago

I believe the known/related crashes are now fixed, but of course we can reopen if there are more.

magreenblatt commented 1 week ago

I'm leaving the original commit 47798d3dbf fix for M127+, as I believe that change is correct in combination with the new fix (pending any crashes to the contrary 😄)

bjdupuis commented 1 week ago

For my understanding, as of this time all known crashes are addressed and the next CEF build for the 126 baseline should be danging-pointer-crash free?

magreenblatt commented 1 week ago

@bjdupuis We address dangling-ptr crashes as they're reported. You can run with --disable-features=PartitionAllocDanglingPtr,PartitionAllocUnretainedDanglingPtr to disable the checks.

bjdupuis commented 1 week ago

@bjdupuis We address dangling-ptr crashes as they're reported. You can run with --disable-features=PartitionAllocDanglingPtr,PartitionAllocUnretainedDanglingPtr to disable the checks.

I thought that @salvadordf had run with those flags and still experienced crashes. Or was it because both weren't specified at the same time?

magreenblatt commented 1 week ago

The final crash that @salvadordf reported (with PartitionAllocUnretainedDanglingPtr disabled) is a DCHECK here in Chromium and unrelated to dangling rawptrs.

bjdupuis commented 1 week ago

The final crash that @salvadordf reported (with PartitionAllocUnretainedDanglingPtr disabled) is a DCHECK here in Chromium and unrelated to dangling rawptrs.

And finally (sorry... really mostly trying to figure out when it's safe to cut a new build on a CEF release to have the best chance at stability), has that DCHECK been reported? If not, is that something that @salvadordf or myself should do?

magreenblatt commented 1 week ago

has that DCHECK been reported?

A quick search of https://crbug.com didn't reveal anything. You can file a new bug there, however Debug-only checks in Chromium tend not to get much attention.

bjdupuis commented 1 week ago

has that DCHECK been reported?

A quick search of https://crbug.com didn't reveal anything. You can file a new bug there, however Debug-only checks in Chromium tend not to get much attention.

Ohh, didn't realize DCHECKs were in debug-only builds. Sorry Marshall, I'll zip it now.