Closed EricCousineau-TRI closed 1 year ago
Output from runs:
$ bazel run //tmp:repro_min_cc
...
[2020-09-05 14:59:58.570] [console] [info] Priming...
[2020-09-05 14:59:58.672] [console] [info] i: 0
[2020-09-05 14:59:58.673] [console] [info] Render 0
[2020-09-05 14:59:58.691] [console] [info] Render 1
[2020-09-05 14:59:58.692] [console] [info] Render 2
[2020-09-05 14:59:58.696] [console] [info] i: 1
[2020-09-05 14:59:58.697] [console] [info] Render 0
[2020-09-05 14:59:58.714] [console] [info] Render 1
[2020-09-05 14:59:58.715] [console] [info] Render 2
[2020-09-05 14:59:58.718] [console] [info] [ Done ]
ubuntu@ip-10-100-3-181:~/workspace/drake$ bazel run //tmp:repro_min_cc -- --use_primer=false
...
[2020-09-05 15:00:16.977] [console] [info] i: 0
[2020-09-05 15:00:16.978] [console] [info] Render 0
[2020-09-05 15:00:17.083] [console] [info] Render 1
[2020-09-05 15:00:17.085] [console] [info] Render 2
[2020-09-05 15:00:17.090] [console] [info] i: 1
[2020-09-05 15:00:17.090] [console] [info] Render 0
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 61
Current serial number in output stream: 62
$ systemctl status xorg.service
● xorg.service - X Server
Loaded: loaded (/lib/systemd/system/xorg.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Thu 2020-09-03 01:40:16 UTC; 1min 12s ago
Process: 105854 ExecStart=/usr/bin/X :0 (code=dumped, signal=ABRT)
...
$ cat /var/log/Xorg.0.log
...
[ 42183.080] (EE) Backtrace:
[ 42183.080] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4d) [0x55c4afb49a9d]
[ 42183.080] (EE) 1: /usr/lib/xorg/Xorg (0x55c4af991000+0x1bc839) [0x55c4afb4d839]
[ 42183.080] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f67d65cc000+0x128a0) [0x7f67d65de8a0]
[ 42183.080] (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (0x7f67d2f6d000+0x4790ec) [0x7f67d33e60ec]
[ 42183.080] (EE)
[ 42183.080] (EE) Segmentation fault at address 0xac
...
Have you tried a P2 or P3 instance at all?
Not to my knowledge, no. We do use P3 for deep learning stuff, but for basic image rendering we'd prefer if the G3 family works -- much more cost-effective.
Per convo w/ Jeremy, hope is that upgrading to latest VTK magically fixes this...
Retrieving a bit of history -- it looks like when this was filed in September 2020, we were using VTK 8.2 so the "upgrade to latest" would be referring to VTK 9 (#13253), which has not yet started.
Closing for lack of current reproducer.
Filed on Anzu initially (Anzu 5388), but was able to reproduce this in pure Drake unittests.
Background
I had a test that would instantiate different
Diagram, Simulator
pairs, and in the diagram was aSceneGraph
with a registeredRenderEngineVtk
. On the first instantiation, rendering and all that would be fine (I could render as many times as I wanted). However, on the second instantiation, I would get aBadValue
error on(GLX, X_GLXCreateContext)
, and it would crash Xorg.This would only happen on CI machines. On my laptop and desktop, I did not receive this error.
CI Configuration
us-east-1 | bionic | 18.04 LTS | amd64 | hvm:ebs-ssd | 20200821.1 | ami-0c34018d0aabaef93
(located using https://cloud-images.ubuntu.com/locator/ec2/)nvidia-smi
)xorg-server 2:1.19.6-1ubuntu4.5
(systemctl status xorg.service
)Workaround
The workaround is to keep make a "primer"
RenderEngineVtk
instance, render with it once, and keep it alive for the duration of the program. Most likely, because VTK uses a "scoped singleton" setup (e.g. on first render, allocate GLX context; on destruction of last renderer, deallocate; then reallocate next time someone wants something).Min Repro
With the following code on 1392df106 (statically or dynamically linked),
--use_primer=false
can reproduce the error;--use_primer=true
can work around it.Also on this commit: https://github.com/EricCousineau-TRI/drake/tree/65e41868549ec4c437bed575a7f255a76b0a62d6/tmp (branch:
issue-anzu5388-wip
)Thanks to @jwnimmer-tri and @SeanCurtis-TRI for helping w/ debugging (and rubber ducking!)
Setting priority to low since we have a workaround.
Per convo w/ Jeremy, hope is that upgrading to latest VTK magically fixes this...