YottaDB / YDB

Mirrored from https://gitlab.com/YottaDB/DB/YDB
Other
76 stars 37 forks source link

[#290] Fix online freeze deadlock with -noautorelease #291

Closed nars1 closed 6 years ago

nars1 commented 6 years ago

A process P1 in gds_rundown() gets the ftok semaphore and access semaphore locks in that order and then can decide to do a wcs_flu() which would then grab crit. It is possible an online freeze process P2 (MUPIP FREEZE -ON -NOAUTORELEASE) sneaks in concurrently and freezes the database file just before P1 gets crit. In that case, P1 would sleep-loop indefinitely waiting for the database to unfreeze (WAIT_FOR_REGION_TO_UNCHILL macro in wcs_flu) and any MUPIP FREEZE -OFF command (which would clear the online freeze) would hang too waiting for the ftok semaphore effectively creating a deadlock. This is the issue.

This is now fixed by checking after grabbing crit in wcs_flu() if the database is frozen online and if so checking if the caller of wcs_flu() is gds_rundown() (indicated by WCSFLU_RET_IF_OFRZ) and if so the wcs_flu() does not flush the db but instead does a jnl flush (at least flushes the journal updates this process did) and returns to the caller gds_rundown() which proceeds with halting this process. That would then release the ftok lock which would let the MUPIP FREEZE -OFF command to proceed thereby fixing the deadlock.