chrisjbillington / zprocess

A collection of utilities for multiprocessing using zeromq.
BSD 2-Clause "Simplified" License
2 stars 5 forks source link

zlock: give up not considered triggerable event, can block acquisitions indefinitely #12

Closed chrisjbillington closed 4 years ago

chrisjbillington commented 4 years ago

When a lock is released, zlock checks if that makes the lock available for other waiting clients. If so, those clients are immediately given the lock.

However, when a client is waiting for a lock and it "gives up" (stops making retry requests to maintain its position in the queue), zlock presntly does not check if that makes the lock available for other clients.

You wouldn't think a client who wasn't holding the lock giving up could make the lock available for other clients, but it can.

Since zlock implements a readers-writer lock giving priority to the writer, new readers are blocked by the presence of a waiting writer. Therefore if some readers hold the lock, and a writer and some more readers are waiting, the writer giving up and losing its place in the queue should result in the waiting readers being given the lock. However, it doesn't presently. Those readers will continue to wait for the lock. The only event that can give it to them is a writer releasing the lock, so the readers will wait until a writer comes along and acquires and then releases the lock. This can be forever if the application is not going to have another writer until reading is finished anyway.

From the client's perspective this just looks like waiting indefinitely for the lock - the communication with the server is still fine so it doesn't give a "server didn't respond" error. Just the application hanging. So it's not clear whether this bug could be responsible for some of the crashes we still see in labscript suite programs.

chrisjbillington commented 4 years ago

Fixed in 238c5474