Closed dmik closed 7 years ago
One of the real life cases when this problem strikes in is the tdbtorture
tool from Samba — its parent process manipulates the TDB file after all children (that initiated the pwrite
mutex creation) have ended. And fails with an assertion.
In the above commit I temporarily solved this problem by simply re-creating the mutex when DosOpenMutexSem
fails with ERROR_INVALID_HANDLE
as well (so that tdbtorture
now works fine). However this is an improper solution because it is possible that the closed mutex handle will be reused by another process so we will end up with using a wrong mutex. Another problem which is not accounted for by this fix is race between two threads of the same process trying to re-create the mutex.
All these problems will be gone once we we do proper tracking by PID but this requires some more work and some more functions (to manipulate lists of PIDs) need to be moved from fcntl.c
to shared code (yes, maintaining kernel functionality in the user land from scratch is a complex task). For the time being, the above fix should be enough to check if making pread/pwrite
fixes the remaining Samba issues (http://trac.netlabs.org/samba/ticket/266) in the first place.
This problem was solved using another approach, see #43. Now the mutex is simply deleted along with the LIBCx global file description structure (SharedFileDesc
) where it's stored when this structure is no more in use (i.e. when all LIBC file descriptors referring to it via LIBC calls overridden in LIBCx, including pread()
/pwrite()
, are closed with close()
).
The initial implementation of
pread/pwrite
has a problem: the mutex for atomic operation is created upon first usage of a file by either these functions orfcntl
. But it's possible that the process that created it terminates before some other related process (its child or parent) callspread/pwrite
on the same file for its own needs. However, since the process that created the mutex is already gone, the mutex gets destroyed and the second process won't be able to open it for operation — it will only see the invalid mutex handle in the shared variable.A solution here is to zero the shared variable when the mutex gets destroyed so that another process will notice that it's null and re-create it again. However, there may be other processes that already opened the mutex (and therefore increased its internal usage counter) so it's not possible to tell which
DosCloseMutex
call actually destroys the mutex. Our own usage counter will also not help since if you only have the usage counter you don't know if you should decrease it on process termination because you don't know if the given process ever opened the given mutex. In order to do so, we need to track PIDs of all processes creating/opening the shared mutex and only zero the variable it when these processes are gone.