Closed mreynolds389 closed 2 weeks ago
IMHO the problem is that the purging thread should not keep holding the changelog state lock It should rather increase cldb->clThreads then release cldb->stLock and decrease cldb->clThreads with slapi_counter_decrement once it has finished using the changelog
IMHO the problem is that the purging thread should not keep holding the changelog state lock It should rather increase cldb->clThreads then release cldb->stLock and decrease cldb->clThreads with slapi_counter_decrement once it has finished using the changelog
Yeah you're right. I had not yet looked into this as I was wrapping up my other testing, but this does fix it. The state lock is just for checking if the changelog is open. So it was being misused and we were taking the main changelog lock in the next frame down anyway. It's been this way for a long time but the issue was only exposed with LMDB. I thought the trimming thread took the same approach but it does not - so only cleanAllRUV/purging would trigger this.
Issue Description
We have a deadlock when we try to write to the changelog and changelog trimming/purging is happening:
The purging thread holds the changelog state lock and tries to start a transaction, but there is a thread trying to write a change to the changelog which is already holding the DB lock/txn and then it tries to take the changelog state lock ---> deadlock.
I'm not 100% sure this is LMDB specific, but it's happening pretty consistently while trying to test changelog purging. I did not see this happen once when testing changelog purging with bdb.