dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Problems reported with FTS3 backend in 4.2.2 release #1108

Open nataliaratnikova opened 6 years ago

nataliaratnikova commented 6 years ago

Reported by the site:

We tried to upgrade PHEDEX to 4.2.2 at out site T2_UA_KIPT.

There are some checksum-related problems with this release, so we downgraded to 4.2.1.

For example, if no Checksum available then transfer with 4.2.2 fails:

`INFO Fri Oct 13 20:15:48 2017; Checksum:

ERR Fri Oct 13 20:15:55 2017; Non recoverable error: [2] SOURCE CHECKSUM globus_ftp_client: the server responded with an error 500 500-Command failed : System error in Failed to open checksum file (host=transfer-4.ultralight.org, user=phedex, path=/store/PhEDEx_LoadTest07/LoadTest07_Prod_Caltech/LoadTest07_Caltech_51): No such file or directory 500-A system call failed: No such file or directory 500 End. ` Looks like in 4.2.1 version no such error because in case of missing checksum it does not attempt to get checksum.

Also in case when checksum is available with 4.2.2 version I noticed such unclear errors which do not happen with 4.2.1:

`INFO Fri Oct 13 20:01:27 2017; Checksum: dabb3730

ERR Fri Oct 13 20:01:35 2017; Non recoverable error: [5] SOURCE CHECKSUM MISMATCH User defined checksum and source checksum do not match dabb3730 != ` There is other issue in 4.2.2 with fixed timeout which always set to 21600.

Timeout in older FTS servers was fixed and set to 4000 which was not enough in some cases.

But in currently running FTS servers timeout depends on file size and always large enough for successful completing of transfers , so is it make sense to set fixed timeout in PHEDEX now?