PF4Public / gentoo-overlay

Personal Gentoo overlay
78 stars 18 forks source link

www-client/ungoogled-chromium: 109.0.5414.74: crashes on load #197

Closed baconsalad closed 1 year ago

baconsalad commented 1 year ago

On load the browser window comes up as crashed, continuously tries to reload in the background and never succeeds.

[23957:23957:0109/175547.856369:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[23957:23957:0109/175547.873953:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[23957:23957:0109/175547.892552:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[23957:23957:0109/175547.909860:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[23957:23957:0109/175547.926557:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[23957:23957:0109/175547.943069:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.

current build flags

www-client/ungoogled-chromium::pf4public -cfi clang -cups custom-cflags -enable-driver -hangouts hevc official optimize-thinlto optimize-webui proprietary-codecs -screencast -suid -system-ffmpeg -system-harfbuzz -system-icu -system-jsoncpp -system-libevent -system-libvpx -system-openh264 -system-openjpeg tcmalloc thinlto vaapi vdpau -widevine
PF4Public commented 1 year ago

Mine is still building and I'm afraid that's too little information to work with.

baconsalad commented 1 year ago

I've already rolled back. When the next version comes out I'll build that and see if it's still happening.

(Accidentally closed this, reopen if you want)

PF4Public commented 1 year ago

worksforme :) image

šŸ¤·

baconsalad commented 1 year ago

What are your use flags?

PF4Public commented 1 year ago

USE="X clang convert-dict cups custom-cflags hevc js-type-check official optimize-thinlto optimize-webui pgo proprietary-codecs pulseaudio qt5 system-av1 system-ffmpeg system-harfbuzz system-icu system-jsoncpp system-libevent system-libusb system-openh264 system-openjpeg system-png system-re2 system-snappy thinlto vaapi -cfi -debug -enable-driver -gtk4 -hangouts -headless -kerberos -pic -screencast (-selinux) -suid -system-libvpx -vdpau -wayland -widevine"

I doubt it has anything to do with flags. You'd better look into dmesg for any segfaults or anything suspicious.

baconsalad commented 1 year ago

Our use flags are almost identical. It gave me some goodies this time in dmesg.

[57962.562576] Chrome_ChildIOT[21844]: segfault at 0 ip 0000561296bdd45a sp 00007f13d41fa720 error 4 in chrome[561292096000+b00d000] likely on CPU 1 (core 1, socket 0)
[57962.562582] Code: b1 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 cd 02 4c 06 48 ff 45 d8 48 83 c3 02 4c 39 fb 74 49 bf 28 00 00 00 e8 36 86 c2 ff <0f> b7 0b 48 8b 7d c0 89 48 20 48 85 ff 74 c7 0f 1f 80 00 00 00 00
[57962.578833] Chrome_ChildIOT[21854]: segfault at 0 ip 0000563ecaa2445a sp 00007fda20dfa720 error 4 in chrome[563ec5edd000+b00d000] likely on CPU 0 (core 0, socket 0)
[57962.578839] Code: b1 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 cd 02 4c 06 48 ff 45 d8 48 83 c3 02 4c 39 fb 74 49 bf 28 00 00 00 e8 36 86 c2 ff <0f> b7 0b 48 8b 7d c0 89 48 20 48 85 ff 74 c7 0f 1f 80 00 00 00 00
[57962.595370] Chrome_ChildIOT[21864]: segfault at 0 ip 00005564a4d1b45a sp 00007f29a5dfa720 error 4 in chrome[5564a01d4000+b00d000] likely on CPU 1 (core 1, socket 0)
[57962.595375] Code: b1 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 cd 02 4c 06 48 ff 45 d8 48 83 c3 02 4c 39 fb 74 49 bf 28 00 00 00 e8 36 86 c2 ff <0f> b7 0b 48 8b 7d c0 89 48 20 48 85 ff 74 c7 0f 1f 80 00 00 00 00
[57962.611575] Chrome_ChildIOT[21875]: segfault at 0 ip 000055f8df23f45a sp 00007f9f85bf9720 error 4 in chrome[55f8da6f8000+b00d000] likely on CPU 5 (core 5, socket 0)
[57962.611581] Code: b1 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 cd 02 4c 06 48 ff 45 d8 48 83 c3 02 4c 39 fb 74 49 bf 28 00 00 00 e8 36 86 c2 ff <0f> b7 0b 48 8b 7d c0 89 48 20 48 85 ff 74 c7 0f 1f 80 00 00 00 00
[57962.628167] Chrome_ChildIOT[21884]: segfault at 0 ip 000055bd6992045a sp 00007f9f42dfa720 error 4 in chrome[55bd64dd9000+b00d000] likely on CPU 1 (core 1, socket 0)
[57962.628172] Code: b1 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 cd 02 4c 06 48 ff 45 d8 48 83 c3 02 4c 39 fb 74 49 bf 28 00 00 00 e8 36 86 c2 ff <0f> b7 0b 48 8b 7d c0 89 48 20 48 85 ff 74 c7 0f 1f 80 00 00 00 00
PF4Public commented 1 year ago

Have you tried starting with the empty profile? Have you tried googling your issue? Quick googling shows me some potentially related issues, but I cannot tell, how applicable they are to your case. If all that fails I would suggest getting the full back-trace from the crash and then analyzing what could cause it.

Since 74 goes into stable, I'll reopen this issue. Maybe someone else also encounters this and / or could help further.

baconsalad commented 1 year ago

I've been building this for a while now and occasionally I have one that won't build or run correctly. If I wait a release or two it will just resolve itself.

As far as googling related issues, what I have found doesn't seem to apply to me.

perfect7gentleman commented 1 year ago

i got it too.

chromium
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
MESA-INTEL: warning: Haswell Vulkan support is incomplete
ATTENTION: default value of option mesa_glthread overridden by environment.
[3088:3088:0113/204826.970719:ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is egl, ANGLE is 
ATTENTION: default value of option mesa_glthread overridden by environment.
[3047:3047:0113/204826.990911:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
[3047:3047:0113/204827.023210:ERROR:network_service_instance_impl.cc(539)] Network service crashed, restarting service.
Segmentation fault
PF4Public commented 1 year ago

Guys, I don't have even a slightest idea, why this may happen. GDB should help. You might need to rebuild it with debug info for this to work.

perfect7gentleman commented 1 year ago

Any clues?

baconsalad commented 1 year ago

Might as well throw my error on.

[32663:0112/014844.623405:ERROR:angle_platform_impl.cc(43)] Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: Could not create a backing OpenGL context.
ERR: Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: Could not create a backing OpenGL context.
[32663:0112/014844.623457:ERROR:gl_display.cc(508)] EGL Driver message (Critical) eglInitialize: Could not create a backing OpenGL context.
[32663:0112/014844.623474:ERROR:gl_display.cc(920)] eglInitialize OpenGL failed with error EGL_NOT_INITIALIZED, trying next display type
[32663:0112/014844.656255:ERROR:angle_platform_impl.cc(43)] Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: Could not create a backing OpenGL context.
ERR: Display.cpp:997 (initialize): ANGLE Display::initialize error 12289: Could not create a backing OpenGL context.
[32663:0112/014844.656286:ERROR:gl_display.cc(508)] EGL Driver message (Critical) eglInitialize: Could not create a backing OpenGL context.
[32663:0112/014844.656300:ERROR:gl_display.cc(920)] eglInitialize OpenGLES failed with error EGL_NOT_INITIALIZED
[32663:0112/014844.656315:ERROR:gl_ozone_egl.cc(23)] GLDisplayEGL::Initialize failed.
[32663:0112/014844.657012:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
[32703:0112/014844.661261:ERROR:gpu_init.cc(521)] Passthrough is not supported, GL is disabled, ANGLE is
thubble commented 1 year ago

I'm getting what appears to be the same error:

[254408.994239] Chrome_ChildIOT[17425]: segfault at 0 ip 0000556cd65ca73b sp 0000556cc89fa810 error 4 cpu 12 in chrome[556cd1b2c000+ac10000] likely on CPU 12 (core 12, socket 0)
[254408.994244] Code: 01 4c 89 f2 0f b6 f9 48 89 c6 4c 89 f1 e8 fd 66 56 fb 48 ff 44 24 28 48 83 c3 02 4c 39 fb 74 48 bf 28 00 00 00 e8 f5 bf c5 ff <0f> b7 0b 89 48 20 48 8b 7c 24 10 48 85 ff 74 c5 0f 1f 44 00 00 48

(exact same error repeated dozens of times)

Here's the very weird part: I also get the same error when rebuilding 108.0.5359.124_p1 with my current system! But if I restore 108.0.5359.124_p1 using the binary package I fortunately remembered to build, it works fine!!

I did switch to a new machine since the last time I built 108 (old Haswell-E 5930k, now it's a Zen 4 7950x, I just swapped in my SSD). However, I don't think that's the issue because the old build works fine, and building using -march=haswell instead of -march=native results in the same breakage in both 108 and 109. I'm also not seeing any other errors on the new system.

I did almost get it to work by buliding with libc++ (I needed to use EXTRA_GN="${EXTRA_GN} use_custom_libcxx=true" and hack the ebuild to use bundled libxml and libxslt). This restores my session and loads pages properly - but then the whole thing still crashes after 5-10 seconds randomly.

I've tried downgrading llvm/clang from 15.0.7 to 15.0.6 (the last version I built successfully with). When building with libstdc++, I downgraded gcc to the earlier snapshot I was using for the last successful build. No luck with any of that.

For those who are (and aren't) getting the error - are you using libc++ or libstdc++? On a related note, is it possible to build chromium with libc++ without using it systemwide? I did the hack to use use_custom_libcxx=true but I think that builds with a bundled copy of libc++ rather than the system one (which I don't even have installed).

perfect7gentleman commented 1 year ago

I use libc++.

PF4Public commented 1 year ago

Have you tried disabling all system-* dependencies? Maybe that could help?

PS: I've also taken some changes from Gentoo, namely: new gcc patchset. This might also help.

perfect7gentleman commented 1 year ago

I wasn't able to build UGC with use_custom_libcxx=true.

joecool1029 commented 1 year ago

For what its worth I've been hitting the same crash with electron 21 and 22, I changed a lot of useflags on my system so it'll be tough to nail down. Could be recent clang/llvm issues, I'll try dropping all the system-* out next.

Update: building with -system-* didn't help and -system-av1 is broken on electron-22, looks like it was missing something being bundled.

perfect7gentleman commented 1 year ago

building with -system-* did not help.

thubble commented 1 year ago

I'm still getting the same error with 108.0.5359.124_p1. I did try building with full debug info with both libc++ (use_custom_libcxx=true) and libstdc++ - and the crash stack trace was exactly the same for both, and ends in unique_ptr. Libstdc++: #0 std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> >::~unique_ptr() (this=0x1cbc012fb580) at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/unique_ptr.h:396 Libc++: #0 std::Cr::unique_ptr<cc::ResourcePool::PoolResource, std::Cr::default_delete<cc::ResourcePool::PoolResource> >::reset[abi:v160000](cc::ResourcePool::PoolResource*) (this=0x1ff00132e6e0, __p=0x0) at ../../buildtools/third_party/libc++/trunk/include/__memory/unique_ptr.h:281

So I don't think the C++ library is the issue.

I also disabled the system-* flags and no change.

Has anyone tried rebuilding 108 with a recent toolchain? I haven't actually tried 109 since my first attempt, since I wanted to make sure I could build a known-working version first.

I'm probably going to try building with gcc next.

thubble commented 1 year ago

OK, 108 works fine when built with gcc (and libstdc++, which is what I normally use). I did have to apply this patch to get it to build: https://chromium-review.googlesource.com/c/chromium/src/+/3963839

So obviously something has broken the clang build - and it must be a recent change in Gentoo packages. I'm out of ideas, though. I guess I can just build with gcc for now, although I'd prefer to have thinlto/pgo.

PF4Public commented 1 year ago

crash stack trace

Full back-trace might give some clues!

Oh, BTW, I see you're using gcc-12! That might be the problem! I'm still with gcc-11.

Others, do you also have gcc-12?

baconsalad commented 1 year ago

gcc version 12.2.1 20221231 (Gentoo 12.2.1_p20221231 p8)

thubble commented 1 year ago

I'm using gcc-12 as well (latest 12.2.1 snapshot). I'll try installing gcc-11 and rebuilding with clang. The build with gcc-12 actually did work, but maybe there's something in libstdc++ or some other gcc library/header that's breaking things when built with clang.

Here's the backtrace, for the libstdc++ version. The libc++ version is identical except for the different STL header locations.

396               get_deleter()(std::move(__ptr));
(gdb) bt
#0  std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> >::~unique_ptr() (this=0x1cbc012fb580) at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/unique_ptr.h:396
#1  base::internal::VectorBuffer<std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> > >::DestructRange<std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> >, 0>(std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> >*, std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> >*) (this=this@entry=0x1cbc01228388, begin=0x1cbc012fb580, end=0x1cbc012fb580) at ../../base/containers/vector_buffer.h:112
#2  0x000055555c7a7748 in base::circular_deque<std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> > >::erase(base::internal::circular_deque_const_iterator<std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> > >, base::internal::circular_deque_const_iterator<std::unique_ptr<cc::ResourcePool::PoolResource, std::default_delete<cc::ResourcePool::PoolResource> > >) (this=0x1cbc01228388, first=..., last=...) at ../../base/containers/vector_buffer.h:85
#3  0x000055555ae97b06 in base::OnceCallback<void (base::FilePath const&, bool)>::Run(base::FilePath const&, bool) && (this=<optimized out>, args=..., args=<optimized out>) at ../../base/functional/callback.h:145
#4  base::internal::FunctorTraits<base::OnceCallback<void (base::FilePath const&, bool)>, void>::Invoke<base::OnceCallback<void (base::FilePath const&, bool)>, base::FilePath, bool>(base::OnceCallback<void (base::FilePath const&, bool)>&&, base::FilePath&&, bool&&)
    (callback=<optimized out>, args=..., args=<optimized out>) at ../../base/functional/bind_internal.h:750
#5  base::internal::InvokeHelper<false, void, 0ul, 1ul>::MakeItSo<base::OnceCallback<void (base::FilePath const&, bool)>, std::tuple<base::FilePath, bool>>(base::OnceCallback<void (base::FilePath const&, bool)>&&, std::tuple<base::FilePath, bool>&&) (functor=<optimized out>, bound=<optimized out>)
    at ../../base/functional/bind_internal.h:826
#6  base::internal::Invoker<base::internal::BindState<base::OnceCallback<void (base::FilePath const&, bool)>, base::FilePath, bool>, void ()>::RunImpl<base::OnceCallback<void (base::FilePath const&, bool)>, std::tuple<base::FilePath, bool>, 0ul, 1ul>(base::OnceCallback<void (base::FilePath const&, bool)>&&, std::tuple<base::FilePath, bool>&&, std::integer_sequence<unsigned long, 0ul, 1ul>) (functor=<optimized out>, bound=<optimized out>, seq=...) at ../../base/functional/bind_internal.h:920
#7  base::internal::Invoker<base::internal::BindState<base::OnceCallback<void (base::FilePath const&, bool)>, base::FilePath, bool>, void ()>::RunOnce(base::internal::BindStateBase*) (base=<optimized out>) at ../../base/functional/bind_internal.h:871
#8  0x000055555c719df2 in base::OnceCallback<void ()>::Run() && (this=0x1cbc01251880) at ../../base/functional/callback.h:145
#9  viz::ClientResourceProvider::ReceiveReturnsFromParent(std::vector<viz::ReturnedResource, std::allocator<viz::ReturnedResource> >)Python Exception <class 'gdb.error'>: value has been optimized out
 (this=0x1cbc0063a260, resources=) at ../../components/viz/client/client_resource_provider.cc:283
#10 0x000055555c7e1171 in cc::LayerTreeHostImpl::ReclaimResources(std::vector<viz::ReturnedResource, std::allocator<viz::ReturnedResource> >)Python Exception <class 'gdb.error'>: value has been optimized out
 (this=0x1cbc0063a000, resources=) at ../../cc/trees/layer_tree_host_impl.cc:2181
#11 0x000055555cdc80aa in non-virtual thunk to cc::mojo_embedder::AsyncLayerTreeFrameSink::ReclaimResources(std::vector<viz::ReturnedResource, std::allocator<viz::ReturnedResource> >) () at ../../cc/mojo_embedder/async_layer_tree_frame_sink.cc:293
#12 0x0000555558593137 in viz::mojom::CompositorFrameSinkClientStubDispatch::Accept(viz::mojom::CompositorFrameSinkClient*, mojo::Message*) (impl=0x1cbc011a8ed8, message=0x7fffffffc120) at gen/services/viz/public/mojom/compositing/compositor_frame_sink.mojom.cc:1758
#13 0x000055555b6447cf in mojo::InterfaceEndpointClient::HandleValidatedMessage(mojo::Message*) (this=0x1cbc011a2a80, message=0x7fffffffc120) at ../../mojo/public/cpp/bindings/lib/interface_endpoint_client.cc:994
#14 0x000055555b649f0b in mojo::MessageDispatcher::Accept(mojo::Message*) (this=0x1cbc011a2b78, message=0x7fffffffc120) at ../../mojo/public/cpp/bindings/lib/message_dispatcher.cc:43
#15 0x000055555b645c8a in mojo::InterfaceEndpointClient::HandleIncomingMessage(mojo::Message*) (this=<optimized out>, message=0x1cbc012fb580) at ../../mojo/public/cpp/bindings/lib/interface_endpoint_client.cc:693
#16 0x000055555b64d918 in mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*)
     (this=this@entry=0x1cbc01192400, message_wrapper=message_wrapper@entry=0x7fffffffc220, client_call_behavior=client_call_behavior@entry=mojo::internal::MultiplexRouter::ALLOW_DIRECT_CLIENT_CALLS, current_task_runner=0x1cbc00220fc0) at ../../mojo/public/cpp/bindings/lib/multiplex_router.cc:1102
#17 0x000055555b64d27f in mojo::internal::MultiplexRouter::Accept(mojo::Message*) (this=0x1cbc01192400, message=0x7fffffffc480) at ../../mojo/public/cpp/bindings/lib/multiplex_router.cc:716
#18 0x000055555b649f0b in mojo::MessageDispatcher::Accept(mojo::Message*) (this=0x1cbc01192430, message=0x7fffffffc480) at ../../mojo/public/cpp/bindings/lib/message_dispatcher.cc:43
#19 0x000055555b642655 in mojo::Connector::DispatchMessage(mojo::ScopedHandleBase<mojo::MessageHandle>) (this=this@entry=0x1cbc01192460, handle=...) at ../../mojo/public/cpp/bindings/lib/connector.cc:561
#20 0x000055555b642fd4 in mojo::Connector::ReadAllAvailableMessages() (this=0x1cbc01192460) at ../../mojo/public/cpp/bindings/lib/connector.cc:618
#21 0x0000555558384804 in base::RepeatingCallback<void (int, int)>::Run(int, int) const & (this=<optimized out>, args=19903872, args=19903872) at ../../base/functional/callback.h:267
#22 0x000055555b65e503 in base::RepeatingCallback<void (unsigned int, mojo::HandleSignalsState const&)>::Run(unsigned int, mojo::HandleSignalsState const&) const & (this=0x7fffffffc6f0, args=0, args=...) at ../../base/functional/callback.h:267
#23 mojo::SimpleWatcher::OnHandleReady(int, unsigned int, mojo::HandleSignalsState const&) (this=0x1cbc00bb5e00, watch_id=<optimized out>, result=0, state=...) at ../../mojo/public/cpp/system/simple_watcher.cc:278
#24 0x000055555b2d10f0 in base::OnceCallback<void ()>::Run() && (this=0x1cbc00249e00) at ../../base/functional/callback.h:145
#25 base::TaskAnnotator::RunTaskImpl(base::PendingTask&) (this=<optimized out>, pending_task=...) at ../../base/task/common/task_annotator.cc:133
#26 0x000055555b2e771f in base::TaskAnnotator::RunTask<base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::LazyNow*)::$_0>(perfetto::StaticString, base::PendingTask&, base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::LazyNow*)::$_0&&) (this=0x1cbc002f86d0, pending_task=..., event_name=..., args=<optimized out>) at ../../base/task/common/task_annotator.h:72
#27 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::LazyNow*) (this=this@entry=0x1cbc002f8500, continuation_lazy_now=continuation_lazy_now@entry=0x7fffffffca40) at ../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:441
#28 0x000055555b2e7175 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() (this=0x1cbc002f8500) at ../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:297
#29 0x000055555b2e7e88 in non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() () at ../../third_party/abseil-cpp/absl/types/optional.h:483
#30 0x000055555b28d91b in base::MessagePumpGlib::HandleDispatch() (this=0x1cbc00258a80) at ../../base/message_loop/message_pump_glib.cc:374
#31 base::(anonymous namespace)::WorkSourceDispatch(_GSource*, int (*)(void*), void*) (source=<optimized out>, unused_func=<optimized out>, unused_data=<optimized out>) at ../../base/message_loop/message_pump_glib.cc:127
#32 0x00005555553ee55d in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
#33 0x0000555555479b04 in  () at /usr/lib64/libglib-2.0.so.0
#34 0x00005555553ea4ca in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0
#35 0x000055555b28d6e9 in base::MessagePumpGlib::Run(base::MessagePump::Delegate*) (this=0x1cbc00258a80, delegate=<optimized out>) at ../../base/message_loop/message_pump_glib.cc:400
#36 0x000055555b2e8197 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run(bool, base::TimeDelta) (this=0x1cbc002f8500, application_tasks_allowed=true, timeout=...) at ../../base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:600
#37 0x000055555b2b0d25 in base::RunLoop::Run(base::Location const&) (this=0x1cbc00e0fc00, location=...) at ../../base/run_loop.cc:141
#38 0x0000555559697f7a in content::BrowserMainLoop::RunMainMessageLoop() (this=<optimized out>) at ../../content/browser/browser_main_loop.cc:1048
#39 0x0000555559699c45 in content::BrowserMainRunnerImpl::Run() (this=0x1cbc00358780) at ../../content/browser/browser_main_runner_impl.cc:162
#40 0x0000555559695886 in content::BrowserMain(content::MainFunctionParams) (parameters=...) at ../../content/browser/browser_main.cc:30
#41 0x000055555adf13f0 in content::RunBrowserProcessMain(content::MainFunctionParams, content::ContentMainDelegate*) (main_function_params=..., delegate=0x7fffffffd680) at ../../content/app/content_main_runner_impl.cc:712
#42 0x000055555adf2680 in content::ContentMainRunnerImpl::RunBrowser(content::MainFunctionParams, bool) (this=this@entry=0x1cbc00264180, main_params=..., start_minimal_browser=false) at ../../content/app/content_main_runner_impl.cc:1253
#43 0x000055555adf247f in content::ContentMainRunnerImpl::Run() (this=0x1cbc00264180) at ../../content/app/content_main_runner_impl.cc:1108
#44 0x000055555adef4b8 in content::RunContentProcess(content::ContentMainParams, content::ContentMainRunner*) (params=..., content_main_runner=0x1cbc00264180) at ../../content/app/content_main.cc:342
#45 0x000055555adefc57 in content::ContentMain(content::ContentMainParams) (params=...) at ../../content/app/content_main.cc:370
#46 0x0000555557606236 in ChromeMain(int, char const**) (argc=<optimized out>, argv=0x7fffffffd888) at ../../chrome/app/chrome_main.cc:175
#47 0x0000555551a2c2b7 in __libc_start_call_main (main=main@entry=0x555557606120 <main(int, char const**)>, argc=argc@entry=3, argv=argv@entry=0x7fffffffd888) at ../sysdeps/nptl/libc_start_call_main.h:58
#48 0x0000555551a2c375 in __libc_start_main_impl (main=0x555557606120 <main(int, char const**)>, argc=3, argv=0x7fffffffd888, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd878) at ../csu/libc-start.c:381
#49 0x0000555557606021 in _start ()
PF4Public commented 1 year ago

I'm leaning towards some weird incompatibility between either of: 1) libstdc++ from gcc-12; 2) chromium internals; 3) clang's libc++, all of which might be in play here. Also I recall this happened in the past at least once beforeā€¦

Looking forward for @thubble's rebuild with gcc-11.

thubble commented 1 year ago

gcc-11 made no difference. I'm out of ideas :(

thubble commented 1 year ago

All the crashes I'm seeing are similar to ones I've seen before when building QtWebEngine (Chromium-based) with recent versions of clang and not adding the -fno-delete-null-pointer-checks flag - but I've verified that that flag is being properly added.

Also, I'm still building 108.0.5359.124_p1, and that's where I'm getting these errors, so it's not 109-related (at least for me). The binpkg I built before updating still works fine, and I originally built it on Dec. 22, so I have no idea what would have changed since then. I did update to a new workstation (Ryzen 7950x instead of Haswell-E 5930k), but I'm building with -march=haswell and it's still not working.

Not sure if I'll have any more ideas tomorrow, but I'm about ready to give up and just start building Chromium with gcc.

crabbedhaloablution commented 1 year ago

I just had a successful build with gcc-12 and these useflags: www-client/ungoogled-chromium-109.0.5414.74_p1::pf4public USE="X cups custom-cflags official proprietary-codecs pulseaudio qt5 system-av1 system-ffmpeg system-harfbuzz system-icu system-jsoncpp system-libevent system-libusb system-openh264 system-png system-re2 system-snappy vaapi vdpau wayland -cfi -clang -convert-dict -debug -enable-driver -gtk4 -hangouts -headless -hevc -js-type-check -kerberos -optimize-thinlto -optimize-webui -pgo -pic -screencast (-selinux) -suid -system-libvpx -system-openjpeg -thinlto -widevine" The clang build that failed had the same useflags except "clang optimize-thinlto thinlto" were enabled.

PF4Public commented 1 year ago

@thubble Did you do the backtrace with gcc-11, to be certain it uses includes from gcc-11? Do you keep emerge logs? Please inspect logs to see what changed since Dec. 22. Also you may do qlist- Iv | sort and I can diff your list to mine.

thubble commented 1 year ago

I figured it out! (At least for my situation with 108). Chromium's custom "PartitionAlloc" memory allocator (which wraps/overrides glibc's malloc/free/etc.) was somehow causing the problem. I built with EXTRA_GN="${EXTRA_GN} use_allocator=\"none\" use_allocator_shim=false" and 108 seems to work fine now.

I'm going to try building 109 now, with lto etc. I think the allocator flags changed for that version so based on what FreeBSD is doing (https://github.com/freebsd/freebsd-ports/commit/36d5a0919f1b243708d710b3badbafe37664fc9e) I'm going to try use_partition_alloc=true use_partition_alloc_as_malloc=false use_allocator_shim=false enable_backup_ref_ptr_support=false.

I'll let you know how it goes with 109. It's baffling that this issue came up suddenly, I can't imagine what might have changed.

joecool1029 commented 1 year ago

FWIW, this is a backtrace of the crash on electron: http://sprunge.us/zvYtgJ

I'll let you know how it goes with 109. It's baffling that this issue came up suddenly, I can't imagine what might have changed.

Chromium switched to C++20. Noted in Gentoo patchset changelog: https://github.com/stha09/chromium-patches/releases

I'll try with the allocator flags next.

Update: did not work, tried with EXTRA_GN="${EXTRA_GN} use_allocator=\"none\" use_allocator_shim=false"

thubble commented 1 year ago

109 builds and runs successfully for me using the following flags: EXTRA_GN="${EXTRA_GN} use_partition_alloc=true use_allocator_shim=false use_partition_alloc_as_malloc=false enable_backup_ref_ptr_support=false"

I did need the following 2 patches to get it to build:

--- a/components/page_load_metrics/browser/observers/use_counter/at_most_once_enum_uma_deferrer.h   2023-01-22 14:49:02.271846266 -0600
+++ b/components/page_load_metrics/browser/observers/use_counter/at_most_once_enum_uma_deferrer.h   2023-01-22 14:49:11.767846110 -0600
@@ -6,6 +6,7 @@
 #define COMPONENTS_PAGE_LOAD_METRICS_BROWSER_OBSERVERS_USE_COUNTER_AT_MOST_ONCE_ENUM_UMA_DEFERRER_H_

 #include "base/metrics/histogram_functions.h"
+#include <bitset>

 namespace internal {
--- a/base/process/memory_linux.cc
+++ b/base/process/memory_linux.cc
@@ -30,6 +30,13 @@ void ReleaseReservationOrTerminate() {

 }  // namespace

+#if !BUILDFLAG(USE_ALLOCATOR_SHIM) && defined(LIBC_GLIBC)
+extern "C" {
+void* __libc_malloc(size_t size);
+void __libc_free(void*);
+}  // extern C
+#endif
+
 void EnableTerminationOnHeapCorruption() {
   // On Linux, there nothing to do AFAIK.
 }

I have no idea why this fixes it. I've never had issues building with PartitionAlloc (which has been the Chromium default forever). The only thing I can maybe think of is a new kernel (or kernel headers) version? I did update from 6.0.x to 6.1.x since my last successful build. I believe PartitionAlloc makes syscalls to allocate memory just like glibc does, so maybe some weird change there affected it. I'm just guessing though - I don't really have any more time/motivation to investigate it.

I'm not sure what the implications are of running without PartitionAlloc (or at least, without overriding malloc() and related calls via shims). Chromium docs imply that there's some sort of security advantage to running their own malloc system, but FreeBSD has been building Chromium without it forever - that's where I got the flags from.

In any case, it's now working fine for me, so I'm going to keep things as-is for now. I might investigate more when I build the next Chromium version, if I have time.

perfect7gentleman commented 1 year ago

@thubble , I tried your way, but it didn't work in my case.

hnhx commented 1 year ago

Same issue for me : /

thubble commented 1 year ago

@thubble , I tried your way, but it didn't work in my case.

Did it make any difference at all, e.g. does it still crash on startup or can you load a page at all?

In any case, it appears that PartitionAlloc isn't the only problem. I'm using libstdc++ from gcc 12.2.1, and clang 15.0.7. Maybe some combination of libc++ and/or clang version is the issue?

This is my build configuration, I even enabled some system- flags like I did before, and it still works fine.

USE="X clang cups hevc kerberos official optimize-thinlto optimize-webui pgo proprietary-codecs pulseaudio system-ffmpeg system-harfbuzz system-libevent system-libusb system-openh264 system-openjpeg system-snappy thinlto vaapi wayland widevine -cfi -convert-dict -custom-cflags -debug -enable-driver -gtk4 -hangouts -headless -js-type-check -pic -qt5 -screencast (-selinux) -suid -system-av1 -system-icu -system-jsoncpp -system-libvpx -system-png -system-re2 -vdpau" ABI_X86="(64)" L10N="-af -am -ar -bg -bn -ca -cs -da -de -el -en-GB -es -es-419 -et -fa -fi -fil -fr -gu -he -hi -hr -hu -id -it -ja -kn -ko -lt -lv -ml -mr -ms -nb -nl -pl -pt-BR -pt-PT -ro -ru -sk -sl -sr -sv -sw -ta -te -th -tr -uk -ur -vi -zh-CN -zh-TW"
CFLAGS="-march=native -Wno-unknown-warning-option -Wno-builtin-macro-redefined"
CXXFLAGS="-march=native -Wno-unknown-warning-option -Wno-builtin-macro-redefined"
LDFLAGS="-march=native -Wl,--thinlto-jobs=16"

I did also build qtwebengine (based on Chromium 87 with security backports), and verified that its build system automatically disables PartitionAlloc. Using Falkon, it doesn't crash on startup, but eventually crashes with a seemingly similar stacktrace (I haven't built a version with debug info yet). It only seems to do this when scrolling. And again, the version I built a while ago still works fine, so I don't think it's any shared libraries that have changed on my system, but rather toolchain and headers.

thubble commented 1 year ago

Update to what I said about qtwebengine above: I remembered I had a patch to build with -fomit-frame-pointer. I reverted that and rebuilt, and qtwebengine works fine now.

These are the relevant parts of my build environment for www-client/ungoogled-chromium-109.0.5414.74_p1 which is currently working without issues (I'm posting this update using it) :

USE="X clang cups hevc kerberos official optimize-thinlto optimize-webui pgo proprietary-codecs pulseaudio system-ffmpeg system-harfbuzz system-libevent system-libusb system-openh264 system-openjpeg system-snappy thinlto vaapi wayland widevine -cfi -convert-dict -custom-cflags -debug -enable-driver -gtk4 -hangouts -headless -js-type-check -pic -qt5 -screencast (-selinux) -suid -system-av1 -system-icu -system-jsoncpp -system-libvpx -system-png -system-re2 -vdpau" ABI_X86="(64)" L10N="-af -am -ar -bg -bn -ca -cs -da -de -el -en-GB -es -es-419 -et -fa -fi -fil -fr -gu -he -hi -hr -hu -id -it -ja -kn -ko -lt -lv -ml -mr -ms -nb -nl -pl -pt-BR -pt-PT -ro -ru -sk -sl -sr -sv -sw -ta -te -th -tr -uk -ur -vi -zh-CN -zh-TW"

CC="/usr/lib/llvm/15/bin/clang"
CXX="/usr/lib/llvm/15/bin/clang++"

AR="/usr/lib/llvm/15/bin/llvm-ar"
OBJCOPY="/usr/lib/llvm/15/bin/llvm-objcopy"
OBJDUMP="/usr/lib/llvm/15/bin/llvm-objdump"
NM="/usr/lib/llvm/15/bin/llvm-nm"
RANLIB="/usr/lib/llvm/15/bin/llvm-ranlib"
READELF="/usr/lib/llvm/15/bin/llvm-readelf"
STRIP="/usr/lib/llvm/15/bin/llvm-strip"

COMMON_FLAGS="-march=native"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"

ASFLAGS=""

LDFLAGS="-march=native"

EXTRA_GN="${EXTRA_GN} use_partition_alloc=true use_allocator_shim=false use_partition_alloc_as_malloc=false enable_backup_ref_ptr_support=false"

MAKEOPTS="-j16"
sys-devel/llvm-15.0.7::gentoo was built with the following:
USE="binutils-plugin libffi ncurses xml zstd -debug -doc -exegesis -libedit -test -verify-sig -xar -z3" ABI_X86="(64) -32 (-x32)" LLVM_TARGETS="AMDGPU BPF WebAssembly (X86) -AArch64 (-ARC) -ARM -AVR (-CSKY) (-DirectX) -Hexagon -Lanai (-LoongArch) (-M68k) -MSP430 -Mips -NVPTX -PowerPC -RISCV (-SPIRV) -Sparc -SystemZ -VE -XCore"

sys-devel/clang-15.0.7-r1::gentoo was built with the following:
USE="extra (pie) static-analyzer xml -debug -doc (-ieee-long-double) -test -verify-sig" ABI_X86="(64) -32 (-x32)" LLVM_TARGETS="AMDGPU BPF WebAssembly (X86) -AArch64 (-ARC) -ARM -AVR (-CSKY) (-DirectX) -Hexagon -Lanai (-LoongArch) (-M68k) -MSP430 -Mips -NVPTX -PowerPC -RISCV (-SPIRV) -Sparc -SystemZ -VE -XCore" PYTHON_SINGLE_TARGET="python3_11 -python3_10 -python3_9"

sys-devel/lld-15.0.7::gentoo was built with the following:
USE="-debug -test -verify-sig" ABI_X86="(64)"

sys-devel/gcc-12.2.1_p20230121::gentoo was built with the following:
USE="custom-cflags (cxx) fortran graphite lto (multilib) nls nptl openmp pgo (pie) sanitize vtv zstd -ada (-cet) -d -debug -default-stack-clash-protection -default-znow -doc (-fixed-point) -go -hardened (-ieee-long-double) -jit (-libssp) -objc -objc++ -objc-gc (-pch) -ssp -systemtap -test -valgrind -vanilla" ABI_X86="(64)"

sys-devel/binutils-2.40::gentoo was built with the following:
USE="gold nls pgo plugins zstd (-cet) -doc (-gprofng) -multitarget -static-libs -test -vanilla" ABI_X86="(64)"

sys-libs/binutils-libs-2.40-r1::gentoo was built with the following:
USE="nls -64-bit-bfd (-cet) -multitarget -static-libs" ABI_X86="(64) -32 (-x32)"

sys-libs/glibc-2.36-r7::gentoo was built with the following:
USE="custom-cflags multiarch (multilib) perl (static-libs) -audit -caps (-cet) -compile-locales (-crypt) -doc -gd -hash-sysv-compat -headers-only -multilib-bootstrap -nscd -profile (-selinux) -ssp -stack-realign -suid -systemd -systemtap -test (-vanilla)" ABI_X86="(64)"

sys-kernel/linux-headers-6.1::gentoo was built with the following:
USE="-headers-only" ABI_X86="(64)"
$ clang++ --version
clang version 15.0.7
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/15/bin
Configuration file: /etc/clang/clang++.cfg

$ cat /etc/clang/clang++.cfg
# This configuration file is used by clang++ driver.
@gentoo-common.cfg

$ cat /etc/clang/gentoo-common.cfg 
# This file contains flags common to clang, clang++ and clang-cpp.
@gentoo-runtimes.cfg
@gentoo-gcc-install.cfg
@gentoo-hardened.cfg

$ cat /etc/clang/gentoo-runtimes.cfg 
# This file is initially generated by sys-devel/clang-runtime.
# It is used to control the default runtimes using by clang.

--rtlib=libgcc
--unwindlib=libgcc
--stdlib=libstdc++
-fuse-ld=bfd

$ cat /etc/clang/gentoo-gcc-install.cfg 
# This file is maintained by gcc-config.
# It is used to specify the selected GCC installation.
--gcc-install-dir="/usr/lib/gcc/x86_64-pc-linux-gnu/12"

$ cat /etc/clang/gentoo-hardened.cfg 
# Some of these options are added unconditionally, regardless of
# USE=hardened, for parity with sys-devel/gcc.
-fstack-clash-protection
-fstack-protector-strong
-fPIE
-include "/usr/include/gentoo/fortify.h"
perfect7gentleman commented 1 year ago

Did it make any difference at all, e.g. does it still crash on startup or can you load a page at all?

it didn't make any difference at all, it still crashes on startup

# This file is initially generated by sys-devel/clang-runtime.
# It is used to control the default runtimes using by clang.

--rtlib=compiler-rt
--unwindlib=libunwind
--stdlib=libc++
-fuse-ld=lld
thubble commented 1 year ago

Weird, is it the same stack trace as before? Maybe system libc++ is somehow causing an issue?

Has anyone else tried building with the flags/patches I'm using to disable the partition allocator?

The strange thing is I haven't seen issues like this reported anywhere outside this repo. I've checked Gentoo's bugzilla, the arch forums, and even the Chromium bug reports. I couldn't find anything in this repo specifically that would cause the issue, and since I had a very similar error with a recently-built qtwebengine (my own ebuild, with no changes since the last successful build), I'm not convinced it has anything to do with this repo specifically.

I haven't tried the new Chromium version that was just released (109.0.5414.119) and not sure when I'll have time.

PF4Public commented 1 year ago

Mine gcc was updated to 12 and still no issues so far.

perfect7gentleman commented 1 year ago

Updated it to 109.0.5414.119, still the same.

Maybe system libc++ is somehow causing an issue?

Other packages built with libc++ function normally.

PF4Public commented 1 year ago

Do you build with cfi? Could it be somehow related to -fexperimental-relative-c++-abi-vtables as mentioned here? Although that one affects Android.

perfect7gentleman commented 1 year ago

Do you build with cfi?

no

USE="clang convert-dict cups custom-cflags gtk4 official optimize-thinlto optimize-webui pgo proprietary-codecs qt5 screencast system-av1 system-ffmpeg system-harfbuzz system-icu system-jsoncpp system-libevent system-libusb system-openh264 system-openjpeg system-png system-re2 system-snappy thinlto vaapi wayland -X -cfi -debug -enable-driver -hangouts -headless -hevc -js-type-check -kerberos -pic -pulseaudio (-selinux) -suid -system-libvpx -vdpau -widevine"

hnhx commented 1 year ago

Updated to 109.0.5414.119 as well, same issue.

arbitrary-dev commented 1 year ago

Encountered this after update to 109.0.5414.119 (gcc was also updated to 12 in the same update). Previous version *.74 (gcc11) was working just fine.

Also I'm on Wayland.

UPD. Same with X. But this time only clang & llvm were updated to 15.0.7.

PF4Public commented 1 year ago

@arbitrary-dev @perfect7gentleman @hnhx @joecool1029 @baconsalad what are your kernel and glibc versions?

arbitrary-dev commented 1 year ago

Kernel 5.15.80 Glibc 2.36-r5

hnhx commented 1 year ago

Kernel: 6.1.8-gentoo Glibc: 2.36-r7

joecool1029 commented 1 year ago

Kernel: 6.1.8 (vanilla) Glibc: 2.36-r7

baconsalad commented 1 year ago

Kernel: 6.1.8 (custom) Glibc: 2.36-r7

arbitrary-dev commented 1 year ago

Compiling with this workaround works indeed:

www-client/ungoogled-chromium -optimize-thinlto -thinlto -clang
thubble commented 1 year ago

Has anyone else tried rebuilding a previously-working version with their current system and toolchain? That was one of the steps I tried when troubleshooting this, and I found that the 108.0.5359.124_p1 binpkg I preserved before upgrading worked fine, but rebuilding using the exact same ebuild had the same problem as 109*.

As I stated earlier, I completely upgraded my workstation between the last successful build and the first attempt at building 109 (funnily enough, the main reason for the upgrade is I'm experimenting with some Chromium hacks and got annoyed with the 4-hour build times). I did try to account for this by building with -march=haswell (my old machine's processor) and it made no difference. The only other system changes I can think of is I updated from kernel (and headers) 6.0.x->6.1.x, and binutils 6.39->6.40. All other upgrades (gcc, clang) I re-tested by downgrading using preserved binpkgs and it made no difference.

Compiling with this workaround works indeed:

www-client/ungoogled-chromium -optimize-thinlto -thinlto -clang

The same thing worked for me (building with gcc) which is why I'm sure clang is at least part of the problem, but I can't figure out why. I even tried with -optimize-thinlto -thinlto clang to see if it worked with clang but no LTO... no luck.