[fix] Wait for run_lock when freeing it

bruntib commented 6 months ago

A store event inserts a line to the run_lock table in order to prevent concurrent storage to the same run. The run_lock record is removed at the end of storage. The insertion and deletion of the run_lock record are locked operations on the database level. In CodeChecker we used "nowait" lock which means that these insert and delete operations fail and throw an exception if they can't happen immediately.

This is too strict in case of removing the run_lock object, because the thrown exception is forwarded to the user who can't handle it anyway. For this reason the run_lock removal is waiting for the database lock to be free. In the worst case the exception will still be thrown after the configured statement_timeout, but ideally that should be an unlikely event.

whisperity commented 6 months ago

With "nowait" option the query will be blocked until the lock is undone. In the worst case when statement_timeout is reached, the exception will be thrown anyway.

Unfortunately, this adds the requirement that the server should be configured with a STATEMENT TIMEOUT, which is not the default case. If the server operator does not configure this value, the store will hang forever, am I reading this right?

Can you please check if using SELECT ... FOR UPDATESKIP LOCKED would be more beneficial for us? If I am reading the docs right (and I am not claiming I am!) then this would mean that in case the locking can not be done, we get back an empty result (so we use .one_or_none() to fetch the maybe-row) and we can thus deduce that another transaction happened to beat us to the punch, so there is no lock for us (the currently committed store) to remove. There is a skip_locked parameter in with_for_update().

To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.

whisperity commented 5 months ago

(Even though most real-world database applications need to be guarded with statement timeouts.)

Still, guidelines are not requirements. If we have this as a requirement without which the server will misbehave, then the server startup process should query the db, check if there is a statement timeout, and show a warning to the operator.

bruntib commented 4 months ago

A different solution will be provided later.

Ericsson / codechecker

[fix] Wait for run_lock when freeing it #4195