Closed lorenz closed 5 years ago
what's your system? does it work without native gpu memory buffers used?
Ubuntu 18.10 with Mesa 18.2 on an RX570. GBM is definitely supported. And yes, it does work without.
It looks more like inconsistency between buffer usage and buffer type. I have RX460 running on Debian stretch with backports and Mesa 18.2, which uses native gpu memory buffers without any problems.
Would it be possible for you to add symbol_level=2 and recompile Chromium to see the bt?
Running a build for symbol_level 2 now
So, I still don't have line-by-line symbols, but by examining the registers and the assembly I'm 99% sure that gbm_device is null. I have no idea why though.
any news?
Not really, still happens and I don't know why. I aborted the symbol_level=2 build after it consumed ~32GiB RAM and 20TiB of disk IO.
will it be possible to add checks along the path and see if the device is really null?
Not really, still happens and I don't know why. I aborted the symbol_level=2 build after it consumed ~32GiB RAM and 20TiB of disk IO.
Hi @lorenz , Could you please try adding the following to your args.gn
:
enable_nacl = false
ozone_auto_platforms = false
use_ozone = true
use_xkbcommon = true
ozone_platform_wayland = true
is_debug = false
remove_webcore_debug_symbols = true
symbol_level = 1
dcheck_always_on = true
This should be enought to get a stack trace in case of crash.
Additionally, are you using use_system_minigbm=true
? If yes, could you try with use_radeon_minigbm=true
instead, and check whether the result is the same?
@nickdiego I'm currently running a build with the args you suggested. Thanks for providing these, it's a bit hard to navigate all the build flags as a non-Chromium-dev. When the build finishes I'll report back.
Build did complete, but the bug still persists. I get a null pointer segfault at gbm_pixmap_wayland.cc:69
.
This is using your build args and use_radeon_minigbm=true
. The crash only happens when I'm force-enabling native GPU memory buffers.
@nickdiego I instrumented the critical part and got this:
[8632:8660:0202/173611.837021:ERROR:gbm_pixmap_wayland.cc(71)] connection_->gbm_device() is null
EDIT: I started adding debug output to the section where gbm_device is initialized and figured out that the issue is that --in-process-gpu
disables the branch !args.single_process
at InitializeGPU()
and thus never initialized the GBM device. When I don't pass that argument I get Failed to initialize gbm device
. I'm still trying to figure out why that fails.
It cannot be null. Otherwise, you won’t be able to start browser at all. Chromium tried to create gbm bo with a buffer type not supported on your device. Can you copy/paste the about://gpu page here?
PS you can’t use native gpu memory buffers with —in-process-gpu flag. The feature is not used for the in-process-gpu at the moment. And won’t be used in the future, I guess.
That path uses egl surfaces instead. That means gbm is not needed.
Though, we might make in-process-gpu to work with gbm as well, and allow native gpu memory buffers then. But if gbm is not available, switch back to egl surfaces instead.
PPS how do you start browser? What flags do you exactly pass,
It is definitely null and the browser starts if not using native gpu buffers. I added a log there which literally checks connection_->gbm_device() == nullptr
. I figured out that --in-process-gpu
doesn't work with native gpu memory buffers. But when not passing that flag I get Failed to initialize gbm device
.
I’ve already told you why it happens. Don’t pass —in-process-gpu flag if you want to use native gpu memory buffers
Is there a specific reason why you don’t want to have a separate gpu process running?
To sum up, gbm is used only without in-process-gpu flag aka a separate gpu process is spawned. Native gpu memory buffers feature heavily relies on that.
Likely, we could always use gbm, which would allow native gpu memory buffers work with in-process-gpu mode. And if gbm is not available, forbid that feature and fall back egl surfaces (again, they are used instead of gbm with in-process-gpu). I don’t think egl surfaces can be used with native gpu memory feature as long as it was initially made for drm and requires native pixmap based on drm planes and etc.
Some more info: I switched to use_system_gbm=true
, now I no longer get a GBM initialization error. But I get this (GBM device available is injected by me and is printed after set_gbm_device() in InitalizeGPU):
[22018:22018:0202/193557.532174:ERROR:ozone_platform_wayland.cc(185)] GBM device available
[22018:22018:0202/193557.549188:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[21990:22001:0202/193557.704705:ERROR:browser_child_process_host_impl.cc(430)] Terminating child process for bad IPC message: Number of strides(1)/offsets(1)/modifiers(0) does not correspond to the number of planes(1)
[1:1:0202/193557.743314:ERROR:command_buffer_proxy_impl.cc(106)] ContextResult::kTransientFailure: Shared memory region is not valid
[1:1:0202/193557.743387:ERROR:context_provider_command_buffer.cc(143)] GpuChannelHost failed to create command buffer.
[22195:22195:0202/193557.758242:ERROR:ozone_platform_wayland.cc(185)] GBM device available
[22195:22195:0202/193557.773503:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[21990:22001:0202/193557.791516:ERROR:browser_child_process_host_impl.cc(430)] Terminating child process for bad IPC message: Number of strides(1)/offsets(1)/modifiers(0) does not correspond to the number of planes(1)
[21990:21990:0202/193557.710611:ERROR:wayland_connection_connector.cc(49)] Not implemented reached in virtual void ui::WaylandConnectionConnector::OnChannelDestroyed(int)
[21990:21990:0202/193557.798019:ERROR:wayland_connection_connector.cc(49)] Not implemented reached in virtual void ui::WaylandConnectionConnector::OnChannelDestroyed(int)
[22247:22247:0202/193557.838304:ERROR:ozone_platform_wayland.cc(185)] GBM device available
[22247:22247:0202/193557.851483:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[22247:22247:0202/193557.853999:FATAL:wayland_connection_proxy.cc(68)] Check failed: wc_ptr_.
#0 0x55fc1f058519 base::debug::CollectStackTrace()
#1 0x55fc1ef847b3 base::debug::StackTrace::StackTrace()
#2 0x55fc1ef9d8fa logging::LogMessage::~LogMessage()
#3 0x55fc1c3f5ac3 ui::WaylandConnectionProxy::CreateZwpLinuxDmabufInternal()
#4 0x55fc1c3f62dc base::internal::Invoker<>::RunOnce()
#5 0x55fc1efa7219 base::debug::TaskAnnotator::RunTask()
#6 0x55fc1effffff base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#7 0x55fc1f0004f4 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#8 0x55fc1efa8f4a base::MessagePumpDefault::Run()
#9 0x55fc1f000899 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#10 0x55fc1efd0020 base::RunLoop::Run()
#11 0x55fc23713114 content::GpuMain()
#12 0x55fc1eaec29e content::ContentMainRunnerImpl::Run()
#13 0x55fc1eb1f6a6 service_manager::Main()
#14 0x55fc1eaea491 content::ContentMain()
#15 0x55fc1b95b1b3 ChromeMain
#16 0x7fd502b4109b __libc_start_main
#17 0x55fc1b95b02a _start
I dropped a bunch of entries related to unimplemented funcitons that don't seem to matter.
Ok, your gbm implementation doesn’t provide modifiers field. That’s why browser process terminates gpu process (we have a validation method in the gpu process side, which checks if passed information is not compromised. That literally means that containers with strides, modifiers and offsets must have the same size.
We didn’t see that kind a problem before, but it seems your system is an exception. I’ll let you know once it fixed. Most likely, that field is going to be removed from the check.
Makes sense. I'm pretty sure that issue is pretty common when using system GBM since Ubuntu pretty much uses unmodified upstream Mesa 18.2, which is what everybody else also uses. The thing is that all the other minigbm implementations don't work on my system.
It’s about gpu driver and dri/drm rather than gbm. Gbm is just generic buffer manager, which abstracts everything underneath.
In any case, the root cause is clear now and will be fixed ASAP
the issue has been moved to upstream https://bugs.chromium.org/p/chromium/issues/detail?id=928261
When running Chromium on a compositor supporting some pixel formats for GPU Memory Buffers (for example Weston) it fails to start with a segmentation fault in
GbmPixmapWayland::InitializeBuffer()
. GDB is not that useful since this is a release build. It looks like that's either caused by size or gbm_device() being null.