clementgallet / libTAS

GNU/Linux software to (hopefully) give TAS tools to games
GNU General Public License v3.0
495 stars 56 forks source link

Portal: Thread fails to initialize? #263

Open kitlith opened 5 years ago

kitlith commented 5 years ago

Here's a log with the Thread Print Category enabled: https://pastebin.com/Phrsb5Sa

I've messed with the settings a bit. Portal does not need the virtual steam client, I haven't found a time tracking option that makes a difference, Recycle Threads being on or off doesn't seem to make much of a difference.

Sometimes, instead of instantly crashing, a blank window is produced that never renders any frames (despite being unpaused), which then continues to "Thread failed to initialize" when you click stop.

clementgallet commented 5 years ago

I'm getting the same thing, as well as for Half-Life. If I start and attach gdb, my whole system freeze and I have to kill the game by switching to a virtual console (Ctrl+Alt+F1), it makes it harder to debug.

kitlith commented 5 years ago

yeah, I run into similar issues too, but not necessarilly limited to starting with gdb.

clementgallet commented 5 years ago

Looking at the code that managed thread creation. When creating a thread using CThread::Start(), the function passed to pthread_create() is in charge of setting the bInitSuccess variable that is used by the parent thread to check if the child thread did initialize correctly. So I guess CThread::Init() excuted by the child thread returns false?

Here's a backtrace during thread init if that could help. It comes from a breakpoint at the end of our pthread_create() wrapper function.

Thread 8 (Thread 0xe63cab40 (LWP 7981)):
#0  0xe96be4b3 in g_type_fundamental () at /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#1  0xe969dc3d in g_object_new_with_properties () at /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#2  0xe969e6f7 in g_object_new () at /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#3  0xe991c2f9 in  () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#4  0xe991980c in  () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#5  0xe99011b8 in gdk_display_open () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#6  0xe98cebb9 in gdk_display_open_default_libgtk_only () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#7  0xe9a9c505 in gtk_init () at /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#8  0xea7ab365 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#9  0xea7ab80c in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#10 0xea041d59 in cef_initialize () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#11 0xe64b0302 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/chromehtml.so
#12 0xe649e264 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/chromehtml.so
#13 0xf6d3fa20 in CThread::ThreadProc(void*) () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libtier0.so
#14 0xf7ea2f7e in libtas::pthread_start(void*) (arg=<optimized out>) at pthreadwrappers.cpp:118
#15 0xf7928fd2 in start_thread () at /lib/i386-linux-gnu/libpthread.so.0
#16 0xf7d1b6d6 in clone () at /lib/i386-linux-gnu/libc.so.6

Parent thread is:

Thread 1 (Thread 0xf73257c0 (LWP 7962)):
#0  0xf7fd3079 in __kernel_vsyscall ()
#1  0xf7c50382 in raise () at /lib/i386-linux-gnu/libc.so.6
#2  0xf7ea6ac2 in libtas::pthread_create(pthread_t*, pthread_attr_t const*, void* (*)(void*), void*)
    (tid_p=<optimized out>, attr=<optimized out>, start_routine=<optimized out>, arg=<optimized out>) at pthreadwrappers.cpp:220
#3  0xf6d40884 in CThread::Start(unsigned int) () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libtier0.so
#4  0xe643ddb8 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/chromehtml.so
#5  0xec927dc0 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/vguimatsurface.so
#6  0xf6d80826 in  () at bin/launcher.so
#7  0xf6d80902 in  () at bin/launcher.so
#8  0xf6d80b88 in  () at bin/launcher.so
#9  0xf6d80ba0 in  () at bin/launcher.so
#10 0xf6d68d95 in LauncherMain () at bin/launcher.so
#11 0x08048504 in main ()

Here's the log at the end:

[libTAS f:3] Thread 8640 (main) pthread_setname_np called with target_thread 3862735680 and name UICEFThread
Thread failed to initialize
[libTAS f:3] Thread 8660        pthread_sigmask call.
[libTAS f:3] Thread 8660        Beginning of thread code -3828

pthread_setname_np() call does not appear in the linked source code. And (if we can trust log message order), the parent thread check of child thread initialization before it has a chance to run its code.

kitlith commented 5 years ago

So, I opened libtier0.so up in ghidra because some of what was happening in the source didn't make sense to me. (like, it sets the boolean to true then checks it immediately afterwards?) And, well, it seems to be behaving more like the _WIN32 side of things than the _LINUX side of things? ThreadProc is definitely using pInit->pInitCompleteEvent at the very least. Yet, it doesn't call CloseHandle later on in the function... so suffice to say I'm a bit confused. At least it makes more sense >_>

pthread_setname_np is called inside ThreadSetDebugName, despite not appearing to have an implementation in available sources.

I will continue looking into this, but this is what's relevant that I've seen so far.

EDIT: Since WaitForCreateComplete is used, there are now a few ways in which thread initialization can fail:

Of course, GetThreadProc or Init could be overridden in a derived class, so to rule that out we could set a breakpoint and verify that the vtable points to the implementations preset in libtier0.so. If they have been overridden, then we have a new set of functions to check.

kitlith commented 5 years ago

something that would probably help with debugging this is if you could pause gdb on program start.

EDIT: I compiled libTAS without it starting the program automatically... when I stepped through in such a way that a couple of the threads apparently succeeded in starting, it demonstrated the freezing-up-the-system behavior. idk. maybe gdbserver and catching that from a VT (or other machine) could be helpful to see what's up?

clementgallet commented 5 years ago

It is likely pthread_cond_timedwait is involved in the issue.

Also, there is currently a problem with pthread_cond_timedwait() hook in libTAS which may be related: a game (or library like libtier0) is likely to gather the current time (that libTAS provides) to compute the timeout variable to be passed to pthread_cond_timedwait(). However, the implementation of pthread_cond_timedwait() gather the current time using a private function __clock_gettime() that we don't hook, or directly makes a futex syscall that contains a timeout argument.

As a result, pthread_cond_timedwait() is performing a wait with a negative relative timeout (for default value of libTAS initial system time), which returns ETIMEDOUT without checking the conditional variable, according to glibc implementation.

From libtier0 decompilation, the thread init timeout wait is 60000 ms, so I guess it expects this to succeed eventually.

So I guess, as a first solution, we could fix the timeout value in pthread_cond_timedwait() so that it is relative to the actual system time.

clementgallet commented 5 years ago

I could run gdb from a a virtual terminal to investigate the freeze, and I'm getting this:

Thread 8 (Thread 0xe63cab40 (LWP 10046)):
#0  0xf7fd3079 in __kernel_vsyscall ()
#1  0xf792f462 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib/i386-linux-gnu/libpthread.so.0
#2  0xf7ea38b5 in libtas::pthread_cond_wait(pthread_cond_t*, pthread_mutex_t*) (cond=<optimized out>, mutex=<optimized out>) at pthreadwrappers.cpp:434
#3  0xf73aaf20 in  () at /usr/lib/i386-linux-gnu/libxcb.so.1
#4  0xf73acbc6 in  () at /usr/lib/i386-linux-gnu/libxcb.so.1
#5  0xf73acd7c in xcb_wait_for_reply64 () at /usr/lib/i386-linux-gnu/libxcb.so.1
#6  0xf780d9ad in _XReply () at /usr/lib/i386-linux-gnu/libX11.so.6
#7  0xf780913f in XSync () at /usr/lib/i386-linux-gnu/libX11.so.6
#8  0xf7eedf5c in libtas::pushNativeXlibEvents() () at xevents.cpp:97
#9  0xf7eee673 in libtas::XFlush(Display*) (display=0xf108000) at xevents.cpp:459
#10 0xe9901f6a in gdk_x11_display_ungrab () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#11 0xe9925f59 in  () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#12 0xe992619b in  () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#13 0xe990beb9 in  () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#14 0xe99011ee in gdk_display_open () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#15 0xe98cebb9 in gdk_display_open_default_libgtk_only () at /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#16 0xe9a9c505 in gtk_init () at /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#17 0xea7ab365 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#18 0xea7ab80c in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#19 0xea041d59 in cef_initialize () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libcef.so
#20 0xe64b0302 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/chromehtml.so
#21 0xe649e264 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/chromehtml.so
#22 0xf6d3fa20 in CThread::ThreadProc(void*) () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libtier0.so
#23 0xf7ea2f8e in libtas::pthread_start(void*) (arg=<optimized out>) at pthreadwrappers.cpp:117
#24 0xf7928fd2 in start_thread () at /lib/i386-linux-gnu/libpthread.so.0
#25 0xf7d1b6d6 in clone () at /lib/i386-linux-gnu/libc.so.6

Thread 1 (Thread 0xf73257c0 (LWP 10028)):
#0  0xf7fd3079 in __kernel_vsyscall ()
#1  0xf7d10b6b in poll () at /lib/i386-linux-gnu/libc.so.6
#2  0xf73aad8d in  () at /usr/lib/i386-linux-gnu/libxcb.so.1
#3  0xf73ab529 in  () at /usr/lib/i386-linux-gnu/libxcb.so.1
#4  0xf73ab5e3 in xcb_writev () at /usr/lib/i386-linux-gnu/libxcb.so.1
#5  0xf780d351 in _XSend () at /usr/lib/i386-linux-gnu/libX11.so.6
#6  0xf7801fd4 in  () at /usr/lib/i386-linux-gnu/libX11.so.6
#7  0xf780220b in XPutImage () at /usr/lib/i386-linux-gnu/libX11.so.6
#8  0xf53eafa0 in XcursorImageLoadCursor () at /usr/lib/i386-linux-gnu/libXcursor.so.1
#9  0xf53eb8d5 in XcursorImagesLoadCursors () at /usr/lib/i386-linux-gnu/libXcursor.so.1
#10 0xf53eb97d in XcursorImagesLoadCursor () at /usr/lib/i386-linux-gnu/libXcursor.so.1
#11 0xf53eef08 in XcursorTryShapeCursor () at /usr/lib/i386-linux-gnu/libXcursor.so.1
#12 0xf77eba66 in XCreateGlyphCursor () at /usr/lib/i386-linux-gnu/libX11.so.6
#13 0xf77ebedf in XCreateFontCursor () at /usr/lib/i386-linux-gnu/libX11.so.6
#14 0xf66fb389 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libSDL2-2.0.so.0
#15 0xf6703b1d in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libSDL2-2.0.so.0
#16 0xf669051c in SDL_CreateSystemCursor () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/libSDL2-2.0.so.0
#17 0xec91f1fe in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/vguimatsurface.so
#18 0xec927dfc in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/vguimatsurface.so
#19 0xf6d80826 in  () at bin/launcher.so
#20 0xf6d80902 in  () at bin/launcher.so
#21 0xf6d80b88 in  () at bin/launcher.so
#22 0xf6d80ba0 in  () at bin/launcher.so
#23 0xf6d68d95 in LauncherMain () at bin/launcher.so
#24 0x08048504 in main ()

It looks like a deadlock from using Xlib connection in two different threads (and XInitThreads() was not called?). May be caused by libtas::pushNativeXlibEvents() from thread 8 gathering events from all Display connections, despite being called from libtas::XFlush() for a specific Display.

clementgallet commented 5 years ago

Commit af1c9e45223318ae3c18a2840ea26c08e6e724bb seems to fix the "Thread failed to initialize" message (Runtime > Wait timeout > Infinite waits also works), and aa9a8323e3b6e68047ef9714af0bfb2da3e1c7cb fixes the freeze.

Runtime > Time tracking > gettimeofday() is required to run the intro video (probably because the video streaming code relies on a busy loop to show the video at the correct framerate). Then it crashes at the next screen for me:

Thread 1 (Thread 0xf73237c0 (LWP 12108)):
#0  0xf77c8b1c in vtable for __cxxabiv1::__class_type_info () at /usr/lib/i386-linux-gnu/libstdc++.so.6
#1  0xef79d822 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#2  0xef79d8c7 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#3  0xef85cf5b in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#4  0xef951308 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#5  0xef955f25 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#6  0xef9533c6 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#7  0xef9a81e0 in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#8  0xef95444f in  () at /home/clement/.local/share/Steam/steamapps/common/Portal/bin/engine.so
#9  0xf6d7eba0 in  () at bin/launcher.so
#10 0xf6d7eba0 in  () at bin/launcher.so
#11 0xf6d66d95 in LauncherMain () at bin/launcher.so
#12 0x08048504 in main ()

Could it be a problem of using another libstdc++ version as the one from steam runtime?

kitlith commented 5 years ago

you can also skip the video by using -novid as a command line option.

as for libstdc++? given I can run portal without adding steam runtime to library path when outside of libTAS, so I'm not sure that's it...

EDIT: I do not seem to experience a crash, but a hang/infloop? opening ghidra to find out what's going on there.

kitlith commented 5 years ago

Ah ha! gettimeofday() is required for running the intro vid, but clock_gettime() is used by Plat_FloatTime in libtier0, which is used elsewhere!

I have successfully made it to the menu.

EDIT: in-game, though, is another matter. it seems to be getting to the first frame and then inflooping or something. maybe there's another time thing we need to track.

EDIT: I made in-game work by enabling Runtime->Debug->Keep main first thread, i think.

kitlith commented 5 years ago

and now I can't get in-game to work, again. same settings as far as I can tell...