cern-fts / gfal2

Multi-protocol data management library
https://dmc-docs.web.cern.ch/dmc-docs/
Other
7 stars 13 forks source link

Filesize mismatch for root transfers from XrootD to dCache site #6

Closed snafus closed 3 years ago

snafus commented 3 years ago

Hi, I had opened in issue in Xrootd: https://github.com/xrootd/xrootd/issues/1454 on observed failures in TPC root transfer between Xrootd site (running 5.3.0) and dCache sites. The issue was first seen in the DOMA TPC tests (on kibana), but can be reproduced on lxplus, e.g:

gfal-copy -vvv --copy-mode=pull root://ceph-gw8.gridpp.rl.ac.uk:1094/dteam:test1/domatest/jwalder/HTTP_1GB root://dcache-se-doma.desy.de:1094/dteam/tpctest/jwtest1GB

In the DOMA tests at least it seemed specific between xrootd and dcache, but I could not guess why ... ?

monitor: root://ceph-gw8.gridpp.rl.ac.uk:1094///dteam:test1/domatest/jwalder/HTTP_1GB?xrd.gsiusrpxy=/tmp/x509up_u28239&xrdcl.intent=tpc root://dcache-se-doma.desy.de:1094///dteam/tpctest/jwtest1GB?xrd.gsiusrpxy=/tmp/x509up_u28239&xrdcl.intent=tpc 8353361 8353361 994050048 119
monitor: root://ceph-gw8.gridpp.rl.ac.uk:1094///dteam:test1/domatest/jwalder/HTTP_1GB?xrd.gsiusrpxy=/tmp/x509up_u28239&xrdcl.intent=tpc root://dcache-se-doma.desy.de:1094///dteam/tpctest/jwtest1GB?xrd.gsiusrpxy=/tmp/x509up_u28239&xrdcl.intent=tpc 8216710 8216710 1002438656 122
event: [1627637292603] BOTH   xroot TRANSFER:EXIT   Job finished, [ERROR] Server responded with an error: [3019] File size mismatch (expected=1000000000, actual=1002438656) (destination)

INFO     Event triggered: BOTH xroot TRANSFER:EXIT Job finished, [ERROR] Server responded with an error: [3019] File size mismatch (expected=1000000000, actual=1002438656) (destination)

DEBUG    Xrootd Query URI: xrd.gsiusrpxy=/tmp/x509up_u28239
INFO     Destination file removed
DEBUG     <- Gfal::Transfer::FileCopy
gfal-copy error: 33 (Numerical argument out of domain) - [gfalt_copy_file][perform_copy][gfal_xrootd_3rd_copy][gfal_xrootd_3rd_copy_bulk] Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3019] File size mismatch (expected=1000000000, actual=1002438656) (destination)

From the Xrootd devs and logs, it is suggested that there's something perhaps happening in gFAL to misreport (or misread?) the final size? More information is available in the GitHub issue mentioned above. On lxplus the gFal Client is:

gfal-copy -V
gfal2-util version 1.6.0 (gfal2 2.19.2)
    dcap-2.19.2
    file-2.19.2
    gridftp-2.19.2
    http-2.19.2
    lfc-2.19.2
    rfio-2.19.2
    sftp-2.19.2
    srm-2.19.2
    xrootd-2.19.2

Please let me know if more info is helpful. Thanks, James

snafus commented 3 years ago

Hi, Apologies for opening this issue. The underlying cause appears to be outside of gFal, and can be reproduced via other means. Closing the ticket. Thanks, James