On 2016-11-18, the central FilePump agent got in a deadlock waiting for a lock on the t_xfer_task table in the DB held by the FileDownload agent running at T3_US_Minnesota which was completely stuck.
I had to kill the FileDownload session in session manager to let FilePump recover.
As a general protection against this kind of issues, maybe we can ask CERN IT-DB if it's possible to force rollback&disconnect for remote agent sessions which are holding an uncommitted transaction for more than X hours?
On 2016-11-18, the central FilePump agent got in a deadlock waiting for a lock on the t_xfer_task table in the DB held by the FileDownload agent running at T3_US_Minnesota which was completely stuck. I had to kill the FileDownload session in session manager to let FilePump recover.
As a general protection against this kind of issues, maybe we can ask CERN IT-DB if it's possible to force rollback&disconnect for remote agent sessions which are holding an uncommitted transaction for more than X hours?