Closed peci1 closed 3 years ago
The same happens when I delegate rendering via EGL to /dev/dri/card1 (when launching the :2 X server).
I can't reproduce the issue using the latest stable release of both VirtualGL and TurboVNC or using the latest evolving code. You seem to be using an older evolving build of both VirtualGL and TurboVNC. Please try upgrading to the latest code.
I've updated to latest dev versions (the build debs), and the error is still there. The basic problem is with the Tcl/Tk interpreter:
$ wish
[VGL] Shared memory segment ID for vglconfig: 7831620
[VGL] VirtualGL v2.6.80 64-bit (Build 20201209)
[VGL] ERROR: in isDisplayExcluded--
[VGL] 75: Unexpected NULL condition
$ ltrace -fS wish
...
[pid 30043] SYS_read(4, "COMBINING MACRON\n\n<dead_circumfl"..., 4096) = 4096
[pid 30043] SYS_read(4, "ALL LETTER O WITH COMBINING MACR"..., 4096) = 4096
[pid 30043] SYS_read(4, "C SMALL LETTER ER WITH COMBINING"..., 4096) = 4096
[pid 30043] SYS_read(4, " : "\342\212\226" U"..., 4096) = 4096
[pid 30043] SYS_read(4, "\204" U2344 # \342\216\225 > APL FUNCTIO"..., 4096) = 4096
[pid 30043] SYS_read(4, "NE\n<Multi_key> <minus> <U2193> "..., 4096) = 4096
[pid 30043] SYS_read(4, "ONAL SYMBOL LEFT SHOE STILE\n<Mul"..., 4096) = 4096
[pid 30043] SYS_read(4, " : "\306\220\314\202" # LATIN CAPIT"..., 4096) = 4096
[pid 30043] SYS_read(4, " \t: "\306\216\314\204" # LATIN C"..., 4096) = 4092
[pid 30043] SYS_read(4, "", 4096) = 0
[pid 30043] SYS_close(4) = 0
[pid 30043] SYS_futex(0x7f6a94f599f0, 129, 0x7fffffff, 0) = 0
[pid 30043] SYS_write(2, "[VGL] ERROR: in isDisplayExclude"..., 75[VGL] ERROR: in isDisplayExcluded--
[VGL] 75: Unexpected NULL condition
) = 75
[pid 30043] SYS_shmdt(0x7f6a95361000, 0, 0x755b, 30) = 0
[pid 30043] SYS_shmctl(0x778053, 0, 0, 30) = -22
[pid 30043] SYS_getpid() = 30043
[pid 30043] SYS_getpid() = 30043
[pid 30043] SYS_exit_group(1 <no return ...>
[pid 30043] +++ exited (status 1) +++
$ strace -f wish
read(4, " # LATIN SMALL LETTER U WITH HO"..., 4096) = 4096
read(4, "SMALL LETTER O WITH CIRCUMFLEX A"..., 4096) = 4096
read(4, "LL LETTER IOTA WITH MACRON\n<dead"..., 4096) = 4096
brk(0x55c65a15f000) = 0x55c65a15f000
read(4, "COMBINING MACRON\n\n<dead_circumfl"..., 4096) = 4096
read(4, "ALL LETTER O WITH COMBINING MACR"..., 4096) = 4096
read(4, "C SMALL LETTER ER WITH COMBINING"..., 4096) = 4096
read(4, " : \"\342\212\226\" U"..., 4096) = 4096
read(4, "\204\" U2344 # \342\216\225 > APL FUNCTIO"..., 4096) = 4096
read(4, "NE\n<Multi_key> <minus> <U2193> "..., 4096) = 4096
read(4, "ONAL SYMBOL LEFT SHOE STILE\n<Mul"..., 4096) = 4096
read(4, " : \"\306\220\314\202\" # LATIN CAPIT"..., 4096) = 4096
read(4, " \t: \"\306\216\314\204\" # LATIN C"..., 4096) = 4092
read(4, "", 4096) = 0
close(4) = 0
futex(0x7ff737f469f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "[VGL] ERROR: in isDisplayExclude"..., 75[VGL] ERROR: in isDisplayExcluded--
[VGL] 75: Unexpected NULL condition
) = 75
shmdt(0x7ff73834e000) = 0
shmctl(7864391, IPC_RMID, NULL) = -1 EINVAL (Invalid argument)
getpid() = 30145
getpid() = 30145
exit_group(1) = ?
+++ exited with 1 +++
But:
LD_PRELOAD= wish
works.
I couldn't find a reasonable way to set breakpoint to the VGL calls, as symbols seem to be missing
$ gdb -q wish
Reading symbols from wish...(no debugging symbols found)...done.
(gdb) set follow-fork-mode child
(gdb) break isDisplayStringExcluded(char*)
Function "isDisplayStringExcluded(char*)" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (isDisplayStringExcluded(char*)) pending.
(gdb) start
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Temporary breakpoint 2 (main) pending.
Starting program: /usr/bin/wish
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[VGL] Shared memory segment ID for vglconfig: 8028195
[VGL] VirtualGL v2.6.80 64-bit (Build 20201209)
[VGL] ERROR: in isDisplayExcluded--
[VGL] 75: Unexpected NULL condition
[Inferior 1 (process 31023) exited with code 01]
(gdb)
Do you have an idea how I could debug this further?
isDisplayExcluded()
is an inline function, so it might not be showing up in the debugger. You can set a breakpoint at faker.h:75
, though.
Which O/S and version of Git are you using? That may help me to reproduce the issue.
Note that the underlying problem has to do with new functionality in VirtualGL 3.0. VGL 3.0 now hangs certain internal VirtualGL Faker structures on the 2D X server connection, in order to avoid hashes. But I don't understand why that mechanism is failing in your case. I need to be able to reproduce the failure so I can develop that understanding.
It's Ubuntu 18.04.5 LTS. tk versoin 8.6.0+9 (that provides the wish
binary), git version 2.24.0.
Setting the breakpoint to faker.h:75
didn't help, the library has probably all debugging symbols stripped down, so the debugger doesn't know which assembly line(s) come from this source. I'll try to get a local debug build of VGL and figure out what happens.
If you're using our official DEB package, then it does not contain debug symbols. You will have to use a custom build. I'll try to reproduce the issue with my Ubuntu 18.04 VM in the meantime.
Ok, so the failing point is here:
https://github.com/tcltk/tk/blob/7570fbe3b848f0d05b9b16849ec6d24ca5ef899f/unix/tkUnixEmbed.c#L243
Some info from GDB:
#0 vglfaker::isDisplayExcluded (dpy=0x5555557e8e10) at /home/peci1/apps/virtualgl/server/faker.h:75
#1 XCreateWindow (dpy=0x5555557e8e10, parent=702, x=0, y=0, width=1, height=1, border_width=0, depth=24, c_class=1, visual=0x555555804230, valuemask=10768, attributes=0x5555557cec50)
at /home/peci1/apps/virtualgl/server/faker-x11.cpp:224
#2 0x00007ffff75e410a in TkpMakeWindow () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#3 0x00007ffff75550fe in Tk_MakeWindowExist () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#4 0x00007ffff75f0d8b in ?? () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#5 0x00007ffff75eeed9 in Tk_SetAppName () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#6 0x00007ffff7554261 in TkCreateMainWindow () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#7 0x00007ffff755fcc7 in ?? () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#8 0x00007ffff755dc7b in ?? () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#9 0x00007ffff75566c5 in ?? () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#10 0x0000555555554acb in ?? ()
#11 0x00007ffff75458a3 in Tk_MainEx () from /usr/lib/x86_64-linux-gnu/libtk8.6.so
#12 0x000055555555498c in ?? ()
#13 0x00007ffff6d71bf7 in __libc_start_main (main=0x555555554960, argc=1, argv=0x7fffffffaff8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffafe8)
at ../csu/libc-start.c:310
#14 0x00005555555549ca in _start ()
(gdb) p extData
$1 = (XExtData *) 0x0
(gdb) p minExtensionNumber
$2 = 1
(gdb) p obj
$3 = {display = 0x5555557e8e10, gc = 0x5555557e8e10, visual = 0x5555557e8e10, screen = 0x5555557e8e10, pixmap_format = 0x5555557e8e10, font = 0x5555557e8e10}
(gdb) p *dpy
$1 = {ext_data = 0x0, free_funcs = 0x5555557ea1e0, fd = 3, conn_checker = 0, proto_major_version = 11, proto_minor_version = 0, vendor = 0x5555557eb260 "The X.Org Foundation", resource_base = 52428800,
resource_mask = 2097151, resource_id = 0, resource_shift = 0, resource_alloc = 0x7ffff63b24a0 <_XAllocID>, byte_order = 0, bitmap_unit = 32, bitmap_pad = 32, bitmap_bit_order = 0, nformats = 7,
pixmap_format = 0x5555557ea230, vnumber = 11, release = 12008000, head = 0x0, tail = 0x0, qlen = 0, last_request_read = 8, request = 9, last_req = 0x555555800220 "\002", buffer = 0x555555800220 "\002",
bufptr = 0x555555800230 "\377\377\377", bufmax = 0x555555804220 "", max_request_size = 65535, db = 0x0, synchandler = 0x0, display_name = 0x5555557ea060 ":2.0", default_screen = 0, nscreens = 1,
screens = 0x5555557ea2e0, motion_buffer = 256, flags = 0, min_keycode = 8, max_keycode = 255, keysyms = 0x0, modifiermap = 0x0, keysyms_per_keycode = 0,
xdefaults = 0x5555557eac90 "\nXft.hintstyle:\thintnone\nXft.rgba:\tnone\nXcursor.theme:\t\nXcursor.size:\t0\nXcursor.theme_core:\t1\n", scratch_buffer = 0x0, scratch_length = 0, ext_number = 1,
ext_procs = 0x5555557eada0, event_vec = {0x7ffff63b4050 <_XUnknownWireEvent>, 0x7ffff63b4050 <_XUnknownWireEvent>, 0x7ffff63b4090 <_XWireToEvent> <repeats 34 times>,
0x7ffff63b4050 <_XUnknownWireEvent> <repeats 49 times>, 0x7ffff6406870, 0x7ffff63b4050 <_XUnknownWireEvent> <repeats 42 times>}, wire_vec = {0x7ffff63b4080 <_XUnknownNativeEvent>,
0x7ffff63b4080 <_XUnknownNativeEvent>, 0x0 <repeats 34 times>, 0x7ffff63b4080 <_XUnknownNativeEvent> <repeats 92 times>}, lock_meaning = 0, lock = 0x0, async_handlers = 0x0, bigreq_size = 4194303,
lock_fns = 0x0, idlist_alloc = 0x7ffff63b24f0 <_XAllocIDs>, key_bindings = 0x0, cursor_font = 0, atoms = 0x0, mode_switch = 0, num_lock = 0, context_db = 0x0, error_vec = 0x0, cms = {defaultCCCs = 0x0,
clientCmaps = 0x0, perVisualIntensityMaps = 0x0}, im_filters = 0x0, qfree = 0x0, next_event_serial_num = 1, flushes = 0x0, im_fd_info = 0x0, im_fd_length = 0, conn_watchers = 0x0, watcher_count = 0,
filedes = 0x5555557ea110 "\003", savedsynchandler = 0x0, resource_max = 2097146, xcmisc_opcode = 0, xkb_info = 0x5555557ead00, trans_conn = 0x0, xcb = 0x5555557ea080, next_cookie = 0, generic_event_vec = {
0x0 <repeats 128 times>}, generic_event_copy_vec = {0x0 <repeats 128 times>}, cookiejar = 0x0}
Weird is that if I edit faker.h to always return true, it works (I'd expect a segfault).
Replicating the bug should be pretty simple in the end. Forget about the extra TurboVNC X server. I now logged directly onto the problematic machine and a plain vglrun +v -d :1 wish
fails. The same with EGL.
I've also tried it on another machine where VirtualGL was never installed and is closer to a clean Ubuntu installation, and it behaves the same.
Reproduced
Symptomatically, it appears from VirtualGL's point of view that Tk is calling XCreateWindow()
without first calling XOpenDisplay()
. However, examining the Tk source code reveals that it calls XkbOpenDisplay()
, which wraps XOpenDisplay()
, so I'm still unsure why the issue is occurring. It's easy enough to work around the issue, but since it's an issue that shouldn't occur, I really need to understand it first. At the moment, I'm stumped. Any additional help you can provide is appreciated.
The piece of information I was missing was the fact that XkbOpenDisplay()
and XOpenDisplay()
are both provided by libX11, so the call to XOpenDisplay()
is internal to libX11. For reasons I don't understand, VirtualGL intercepts the internal XOpenDisplay()
call on CentOS but not on Ubuntu, which is why I was unable to reproduce the issue until I fired up an Ubuntu VM (my normal dev/test machines run CentOS.)
The issue was fixed by interposing XkbOpenDisplay()
and performing the same faker initialization in the body of that function that is performed in the body of XOpenDisplay()
. Commits will be pushed shortly.
Thanks, that fixed it!
Hello, I'm running a setup consisting of TurboVNC server and VGL, and I've noticed the following issue when running
git gui
:$ git gui [VGL] Shared memory segment ID for vglconfig: 753716 [VGL] VirtualGL v2.6.80 64-bit (Build 20200904) [VGL] ERROR: in isDisplayExcluded-- [VGL] 75: Unexpected NULL condition
I run the setup as following: The system has an NVidia GPU. X server :0 is GDM greeter, nothing else. X server :1 is standard Ubuntu desktop run by logging in from GDM. Both :0 and :1 are accelerated.
Then I remotely login to the machine over SSH and run the following command to start a remotely accessible accelerated X server :2 with a window manager:
The session starts, I can connect to it with a VNC client and then I use the started window manager to run other programs as needed. Most programs work (glxgears, darktable, libreoffice with HW acceleration), but git gui shows the error described above.
TurboVNC Server v2.2.80 (build 20200807).
Here is a result of glreadtest run on X server :2