Original Savannah ticket 98318 reported by None on Fri Oct 19 06:12:39 2012.
Investigating an issue with FTS, I just found out that the download agent does not respect the intended timeout value for abandoning transfer jobs in certain situations.
Intended behaviour: if the job state poll command (e.g. glite-transfer-status) is failing consistently for more than "timeout" (default 1h), abandon the job. If the state poll succeeds at least once per hour, don't abandon the job
Actual behaviour: it the job state poll command fails even once, and the job state hasn't changed in the last "timeout" interval (default 1h), abandon the job immediately
This is a lucky "feature" for us, since there is currently an issue with FTS that causes jobs to be stuck in Ready state forever, and without this bug they would never be marked as abandoned and queued for resubmission. However, it also causes the download agent to also improperly abandon jobs that are in Active state for more than one hour...
Proposing a fix and a new feature:
change the job timestamp on every successful poll instead of every state change, to avoid abandoning jobs after a single polling glitch
add a new timeout for jobs (default longer e.g. 8 hours - it should be longer than the FTS timeouts) to abandon them in case the job state is stuck, regardless if the polling is working or not
Original Savannah ticket 98318 reported by None on Fri Oct 19 06:12:39 2012.
Investigating an issue with FTS, I just found out that the download agent does not respect the intended timeout value for abandoning transfer jobs in certain situations.
Intended behaviour: if the job state poll command (e.g. glite-transfer-status) is failing consistently for more than "timeout" (default 1h), abandon the job. If the state poll succeeds at least once per hour, don't abandon the job
Actual behaviour: it the job state poll command fails even once, and the job state hasn't changed in the last "timeout" interval (default 1h), abandon the job immediately
This is a lucky "feature" for us, since there is currently an issue with FTS that causes jobs to be stuck in Ready state forever, and without this bug they would never be marked as abandoned and queued for resubmission. However, it also causes the download agent to also improperly abandon jobs that are in Active state for more than one hour...
Proposing a fix and a new feature: