dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Stuck agents holding uncommitted transactions #1071

Open nikmagini opened 7 years ago

nikmagini commented 7 years ago

On 2016-11-18, the central FilePump agent got in a deadlock waiting for a lock on the t_xfer_task table in the DB held by the FileDownload agent running at T3_US_Minnesota which was completely stuck. I had to kill the FileDownload session in session manager to let FilePump recover.

As a general protection against this kind of issues, maybe we can ask CERN IT-DB if it's possible to force rollback&disconnect for remote agent sessions which are holding an uncommitted transaction for more than X hours?