The FilePump agent doesn't check if a transfer task is already in progress (e.g. submitted to FTS) when marking the task as expired. This is most likely for long-running transfers (large files, or waiting for a long time in the FTS queue).
If the transfer is immediately rerouted by FileRouter, the FileDownload agent of the destination site can pick up the new task and try to execute it while the previous transfer of the same file is still in progress.
Normally this will simply cause both transfers to fail, but in the case reported in the ticket the transfer was incorrectly marked as successful, leading to a storage inconsistency at the destination.
Note on current handling of expiration times:
the FileRouter agent will extend the expiration time if the rate is sufficient and the task will expire in the next two hours
the FileDownload agent will not fetch from the DB the tasks that will expire in the next hour
the FileDownload agent will mark the transfer as "failed" if it is not yet started and it is going to expire in the next 20 minutes, and will not start the transfer
the FilePump agent will mark the transfer as "failed" if it has already expired. In this case the FileDownload agent will "forget" the task: if the transfer has not yet started, the agent will start it; if the transfer is already running, the agent will let the transfer run to completion without performing any further action (cleanup, cancel, etc.)
Probably the handling of expiration times should be simplified and made more consistent between the different agents.
Original Savannah ticket 93621 reported by None on Thu Apr 12 05:30:50 2012.
Hi,
follow up from this incident report:
https://savannah.cern.ch/support/?127757
The FilePump agent doesn't check if a transfer task is already in progress (e.g. submitted to FTS) when marking the task as expired. This is most likely for long-running transfers (large files, or waiting for a long time in the FTS queue). If the transfer is immediately rerouted by FileRouter, the FileDownload agent of the destination site can pick up the new task and try to execute it while the previous transfer of the same file is still in progress. Normally this will simply cause both transfers to fail, but in the case reported in the ticket the transfer was incorrectly marked as successful, leading to a storage inconsistency at the destination.
Note on current handling of expiration times:
Probably the handling of expiration times should be simplified and made more consistent between the different agents.
Cheers Nicolo'