pthread_cond not working properly (32bit apps)

jamesu commented 5 years ago

Got another threading issue.

In the case of torque, in a few cases it will make use of semaphores to signal if work is done in another thread. This is mostly used when issuing script callbacks in the main game thread.

The problem seems to be that the semaphore code isn't working correctly - specifically in Semaphore::acquireSemaphore it uses pthread_cond_wait to wait on a condition to be signalled, but in darling it returns a value other than 0, i.e. an error has occured.

I've attached an example to this issue which gives a simpler test case. Note that it works correctly on OSX and regular linux, but not in darling (i.e. the "Waiting on mCond failed in acquireSemaphore()." assert is triggered). Also since usleep is not working correctly in 32bit mode, I opted just to make it calculate fib instead for the parts which should be usleep, but this shouldn't affect the actual semaphore test.

thread_test.txt

jamesu commented 5 years ago

For additional reference, can also confirm it works fine in a 64bit build, whereas in 32bit build it returns "1" and the system errno is "22" (EINVAL).

Not sure why it would be returning "1" though, I thought it's supposed to only return error codes (and no obvious codepath seems to return EPERM).

bugaevc commented 5 years ago

Well, can you debug where the error code is coming from?

As for usleep on x86, would you like to try to write a clock_sleep_trap_impl implementation yourself?

jamesu commented 5 years ago

Sure, i'll have a prod and try to narrow down whats happening in pthread. Just reporting here in case anyone has any bright ideas ;)

Can prob do clock_sleep_trap_impl - just need to figure out how it's meant to work (in terms of the mach stuff).

LubosD commented 5 years ago

I can't guarantee you that the bug is there, but the bulk of the relevant kernel support code for pthread sync primitives is in here: https://github.com/darlinghq/darling-newlkm/blob/master/darling/psynch_support.c

It could very well be also in traps.c, maybe the lock address is not properly expanded from 32-bits to 64-bits or something...

jamesu commented 5 years ago

Done some more prodding with this. One problem I keep running into though is lldb running in darling introduces a bit of randomness into the debugging process - often when it hits function calls it will seemingly skip execution to further along in the parent function causing lldb to hang (esp if its waiting on a mutex).

So after resorting to printf, it seems whats happening is __psynch_cvwait is returning 22, i.e. EINVAL (the 1 i was getting before was just a red herring from a bad cast) - so it's some kind of syscall issue.

I'll see if I can dig any further to figure out what exactly is going wrong there.

jamesu commented 5 years ago

After some further investigation and comparison with what happens in 64bit mode, it seems some very strange things are happening by the time __psynch_cvwait is called.

Looking at what gets fed into the wrapper for the kernel module function, for 32bit mode it looks like:

sys_psynch_cvwait(cv=0x79e761bc, cvgen=256, cvugen=1 mutex=0x0 mgen=0 ugen=4156770246 sec=8784097062445673329 usec=256)

Whereas in 64bit mode it gets something like:

sys_psynch_cvwait(cv=0x7fa89dc01370, cvgen=256, cvugen=0 mutex=0x0 mgen=0 ugen=96 sec=0 usec=0)

For ref the trace will look something like this:

  * frame #0: 0x00007ffff7c2d9f9 libsystem_kernel.dylib`sys_psynch_cvwait(cv=0x00007fdb4d401370, cvgen=256, cvugen=0, mutex=0x0000000000000000, mgen=0, ugen=96, sec=0, usec=0) + 89 at psynch_cvwait.c:15
    frame #1: 0x00007ffff7c2e13b libsystem_kernel.dylib`_darling_bsd_syscall + 50
    frame #2: 0x00007ffff7c14672 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #3: 0x00007ffff79c92b6 libsystem_pthread.dylib`_pthread_cond_wait + 1414
    frame #4: 0x00007ffff79d27db libsystem_pthread.dylib`pthread_cond_wait + 75

At this point I'm guessing there's something wrong with the way the args are being passed through to sys_psynch_cvwait since none of the input parameters make much sense (in 32bit mode at least). Things seem a little fishy given for example there is a "flags" parameter in __psynch_cvwait but no "flags" in sys_psynch_cvwait, and both sec values should be 0 ...

jamesu commented 5 years ago

Ok looks like there's at least two problems in this case:

1) The prototype for the sys_ function should actually be long sys_psynch_cvwait(void* cv, uint64_t cvlsgen, uint32_t cvugen, void * mutex, uint64_t mugen, uint32_t flags, int64_t sec, uint32_t nsec)

2) Not enough args are being copied in __darling_mach_syscall (sys_psynch_cvwait uses a fair amount of 64bit variables)

Adding more copy_args and increasing the stack size (replacing 32 or 24 with 48) seems to have fixed the values being copied to the thunk function but the actual kernel module function still returns -22 for some reason. Need to do some more digging into the kernel module I guess.

jamesu commented 5 years ago

After some further prodding I came to the realization that the psynch_cvwait_args struct will be a different size depending on if its generated in a 32bit or 64bit process (since the long it contains is 4 or 8 bytes). This is also a problem with psynch_cvsignal_args. I don't think any other structs are affected since they either use uint64_t or have an i386 ifdef with padding.

So basically it works now. I'll see about making a clean PR for this.

LubosD commented 5 years ago

Yes, having varying struct member sizes in these syscalls is definitely a mistake. Thank you!

jamesu commented 5 years ago

Hit a little bit of a snag with my fix. While my simple commandline test works, I seem to have inadvertently broken .app loading as I now get a segfault when loading more advanced app bundles, with the following trace:

* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0xf737a310 libdyld.dylib`misaligned_stack_error_
    frame #1: 0xf7f814cc dyld`initialPoolContent + 17196
    frame #2: 0xf7e9f654 dyld`ImageLoaderMachO::segLoadCommand(unsigned int) const + 36
    frame #3: 0xf667a40c libelfloader.dylib
    frame #4: 0xf7e9adf0 dyld`ImageLoaderMachOCompressed::initializeCoalIterator(ImageLoader::CoalIterator&, unsigned int, unsigned int) + 176

All I've really done though is expand the copy_arg lists (to account for the extra size required) and stack pointer ( https://github.com/jamesu/darling/commit/29385695aac6891f2af9612876c2367882062b92 ) and fixed the struct sizes on the lkm ( https://github.com/jamesu/darling-newlkm/commit/133193fb18b9908cafd6d33295836adfa54ba478 )...

Am I missing something here?

bugaevc commented 5 years ago

How did it work in the first place? The stack need to be 16-aligned before calling a function; upon function entry (in this case when entering __darling_{mach,bsd}_syscall) it's then 16+4-aligned; if you're subtracting 24, 32 or 48 the stack is not going to end up 16-aligned. Or am I miscalculating?

jamesu commented 5 years ago

'tis a mystery. Assuming $esp+4 is aligned inside the function, In the case of __darling_mach_syscall by the time we get inside the syscall function it will be off by 12 bytes, and for __darling_bsd_syscall it will be off by 4.

But even weirder, why is changing syscall stack sizes affecting libdyld? I mean the stack should return to normal after...right?

bugaevc commented 5 years ago

But even weirder, why is changing syscall stack sizes affecting libdyld? I mean the stack should return to normal after...right?

Not if the callout into libdyld happens inside the syscall code (when first accessing a lazily-bound function via dyld_stub_binder)

bugaevc commented 5 years ago

...This is normally not supposed to happen (libkernel tries to avoid depending on other parts of libSystem), but when spawning threads via elfcalls you end up calling into dyld (and into ELF code) explicitly.

jamesu commented 5 years ago

Had a further prod and narrowed it down to __darling_bsd_syscall- if I change that, alignment tends to go weird in dyld.

In the original, it will subtract $24. 24 into 16... your 8 off either way. Though there's probably something being added or subtracted at some point which is eluding me. Using that as the basis for the alignment, the next possible value is $40, and $56

And indeed if I subtract $56 it doesn't crash. Weird.

In any case I think I will check all these syscalls and come up with more reasonable arg sizes. I'm not too comfortable just making everything 56 bytes for instance.

jamesu commented 5 years ago

Also had a scan though the syscalls, noting the general sizes. Biggest I found in each (not factoring in any padding each param may have)...

__darling_mach_syscall mach_vm_map (56 bytes)

__darling_machdep_syscall set_ldt (16 bytes)

__darling_bsd_syscall psynch_cvsignal (52 bytes)

So definitely in the case of i386 with the exception of __darling_machdep_syscall the reserved arg size should be at least 56 (to cover everything).

jamesu commented 5 years ago

Ok, tested and submitted some PRs which seem to fix this particular issue. Everything else seems to be working ok (i.e. nothing new appears to have broken in 32bit mode).

Only the LKM fix is required to fix this particular issue, however in more advanced cases where the last parameters are used, the syscalls fix should be applied too.

darlinghq / darling

pthread_cond not working properly (32bit apps) #524