Open DmitryLitvintsev opened 2 years ago
There's a fair bit of information missing in order to understand the problem, reproduce and fix it.
Ideally, can you provide command-line options for globus-url-copy
that reproduces the issue?
How is data reaching the pool: is the door proxying? is the client using MODE S or MODE E?
Could you provide the access log file (for the FTP session containing the failed transfer described above)?
I'm assuming the client is aborting the transfer because it's taking too long. Could you describe at what point the client is getting stuck? (is the mover started? does the client connect to the pool?)
You mention the aborted transfer left a zero-length file that is precious. Would a successful uploaded file in this directory also be precious?
In other words, is the file being precious a symptom of a problem, or just contextual information?
I do not know exact options passed to globus-url-copy in the quoted case. It is likely mode E, data is not proxied by door.
Coincidentally I am using TPC to transfer data from one dCache instances to another. Before upgrade to 7.2 things worked as expected, no 0 length file left behind. Now I observe a significant amount of 0 length files left behind. They seem to all fall into the same category - door connection made, but client never connects to the pool. I assume mover started.
IOW all fails are 451 Aborting transfer due to session termination
.
The command I am using:
globus-url-copy -dbg -nodcau -p 4 -fast -cd -vb
I attach here door access log and billing log for one of the tranfer,.
Client logic is :
rc = 1
tries = 0
while rc != 0 and tries < 3:
rc = system("globus-url-copy ...")
tries++
billing:
[root@fndca5b 10]# grep 'CSskim-NP-20101123-010613-6113518_p21.20.00' billing-2021.10.21
2021.10.21 18:22:48:857 [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [0000A44CDD5E64D44A4F8E0C109F3774431A,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] d0.prd@enstore 900034 0 {451:"Aborting transfer due to session termination"}
2021.10.21 18:22:48:901 [pool:rw-stkendca1904-2@rw-stkendca1904-2Domain:transfer] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507] [0000A44CDD5E64D44A4F8E0C109F3774431A,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] d0.prd@enstore 0 0 true {GFtp-2.0 131.225.69.61 41419} [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain:1634857668824000] [p2p=false] {666:"Transfer forcefully killed: Active transfer cancelled: killed by door: Aborting transfer due to session termination"}
2021.10.21 18:37:49:223 [door:GFTP-stkendca58a-AAXO5SxBXLg@gridftp-stkendca58aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [0000F5BB7A0EC033452BAC8C911C066014B2,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] d0.prd@enstore 900076 0 {451:"Aborting transfer due to session termination"}
2021.10.21 18:37:49:230 [pool:rw-stkendca1905-2@rw-stkendca1905-2Domain:transfer] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507] [0000F5BB7A0EC033452BAC8C911C066014B2,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] d0.prd@enstore 0 0 true {GFtp-2.0 131.225.69.61 46067} [door:GFTP-stkendca58a-AAXO5SxBXLg@gridftp-stkendca58aDomain:1634858569147000] [p2p=false] {666:"Transfer forcefully killed: Active transfer cancelled: killed by door: Aborting transfer due to session termination"}
2021.10.21 18:42:35:904 [door:GFTP-stkendca54a-AAXO5XL6X4g@gridftp-stkendca54aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:36:218 [door:GFTP-stkendca54a-AAXO5XMCgZA@gridftp-stkendca54aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 11 0 {550:"File exists"}
2021.10.21 18:42:36:698 [door:GFTP-stkendca1804-AAXO5XMHzQg@gridftp-stkendca1804Domain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:37:516 [door:GFTP-stkendca1804-AAXO5XMOmzg@gridftp-stkendca1804Domain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 14 0 {550:"File exists"}
2021.10.21 18:42:38:069 [door:GFTP-stkendca56a-AAXO5XMbwmg@gridftp-stkendca56aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 210 0 {550:"File exists"}
2021.10.21 18:42:38:618 [door:GFTP-stkendca56a-AAXO5XMjjoA@gridftp-stkendca56aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 14 0 {550:"File exists"}
2021.10.21 18:42:38:983 [door:GFTP-stkendca58a-AAXO5XMsd8A@gridftp-stkendca58aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:39:291 [door:GFTP-stkendca58a-AAXO5XMxfOg@gridftp-stkendca58aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:39:613 [door:GFTP-stkendca1804-AAXO5XM2oVA@gridftp-stkendca1804Domain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:40:150 [door:GFTP-stkendca1804-AAXO5XM7Gdg@gridftp-stkendca1804Domain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 12 0 {550:"File exists"}
2021.10.21 18:42:40:556 [door:GFTP-stkendca63a-AAXO5XNDtPg@gridftp-stkendca63aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 13 0 {550:"File exists"}
2021.10.21 18:42:40:878 [door:GFTP-stkendca63a-AAXO5XNJfXA@gridftp-stkendca63aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 11 0 {550:"File exists"}
2021.10.21 18:42:41:285 [door:GFTP-stkendca65a-AAXO5XNPJqg@gridftp-stkendca65aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 18 0 {550:"File exists"}
2021.10.21 18:42:42:031 [door:GFTP-stkendca65a-AAXO5XNUpOg@gridftp-stkendca65aDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 208 0 {550:"File exists"}
2021.10.21 18:42:43:328 [door:GFTP1-fndca4b-AAXO5XNgzjA@gridftp1-fndca4bDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 14 0 {550:"File exists"}
2021.10.21 18:42:43:696 [door:GFTP1-fndca4b-AAXO5XN0C_g@gridftp1-fndca4bDomain:request] ["/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov":dzero::1841:1507:131.225.13.45] [,0] [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00] <Unknown> 13 0 {550:"File exists"}
[root@fndca5b 10]#
door access:
[root@stkendca58a dcache-log]# grep 'GFTP-stkendca58a-AAXO5PaVQ-A' gridftp-stkendca58aDomain.log
21 Oct 2021 18:22:48 (GFTP-stkendca58a-AAXO5PaVQ-A) [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain] Transfer error: 451 Aborting transfer due to session termination
[root@stkendca58a dcache-log]# grep 'GFTP-stkendca58a-AAXO5PaVQ-A' gridftp-stkendca58aDomain.access
level=INFO ts=2021-10-21T18:07:48.398-0500 event=org.dcache.ftp.hello session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain socket.remote=131.225.13.45:53284 socket.local=131.225.69.61:2811 reply="220 GSI FTP door ready"
level=INFO ts=2021-10-21T18:07:48.399-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="AUTH GSSAPI" reply="334 ADAT must follow"
level=INFO ts=2021-10-21T18:07:48.407-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-21T18:07:48.424-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-21T18:07:48.431-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-21T18:07:48.443-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ADAT ..." reply="235 OK"
level=INFO ts=2021-10-21T18:07:48.460-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{USER :globus-mapping:}" user.dn="/DC=org/DC=incommon/C=US/ST=Illinois/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=dcatest01.fnal.gov" user.mapped=1841:1507 reply="ENC{200 User :globus-mapping: logged in}"
level=INFO ts=2021-10-21T18:07:48.461-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PASS}" reply="ENC{200 PASS command successful}"
level=INFO ts=2021-10-21T18:07:48.462-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{SITE HELP}" reply="ENC{214-The following site-specific commands are supported:\r\nSITE <SP> HELP - Information about SITE commands\r\nSITE <SP> BUFSIZE <SP> <size> - Set network buffer to <size>\r\nSITE <SP> CHKSUM <SP> <value> - Fail upload if ADLER32 checksum isn't <value>\r\nSITE <SP> CHGRP <SP> <group> <SP> <path> - Change group-owner of <path> to group <group>\r\nSITE <SP> CHMOD <SP> <perm> <SP> <path> - Change permission of <path> to octal value <perm>\r\nSITE <SP> CLIENTINFO <SP> <id> - Provide server with information about the client\r\nSITE <SP> SYMLINK <SP> <dest> <SP> <path> - Register symlink at <path>, pointing to <dest>\r\nSITE <SP> SYMLINKFROM <SP> <path> - Register symlink location; SYMLINKTO must follow\r\nSITE <SP> SYMLINKTO <SP> <path> - Create symlink to <path>; SYMLINKFROM must be earlier command\r\nSITE <SP> TASKID <SP> <id> - Provide server with an identifier\r\nSITE <SP> USAGE <SP> [TOKEN <SP> <token> <SP> ] <path>\r\n214 SITE <SP> WHOAMI - Provides the username or uid of the user}"
level=INFO ts=2021-10-21T18:07:48.462-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{FEAT}" reply="ENC{211-OK\r\n EOF\r\n PARALLEL\r\n MODE-E-PERF\r\n SIZE\r\n SBUF\r\n ERET\r\n ESTO\r\n GETPUT\r\n MDTM\r\n CKSUM ADLER32,MD5,MD4,SHA-1,SHA-256,SHA-512\r\n MODEX\r\n TVFS\r\n MFMT\r\n MFCT\r\n MFF Modify;Create;UNIX.mode;\r\n PASV AllowDelayed\r\n MLSC\r\n MLST Size*;Modify*;Create;Type*;Unique*;Perm*;UNIX.owner*;UNIX.group*;UNIX.uid*;UNIX.gid*;UNIX.mode*;UNIX.ctime;UNIX.atime;\r\n211 End}"
level=INFO ts=2021-10-21T18:07:48.463-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{SITE CLIENTINFO scheme=gsiftp;appname=\"globus-url-copy\";appver=\"10.10 (gcc64, 1629915172-0) [Grid Community Toolkit 6.2]\";}" reply="ENC{250 OK}"
level=INFO ts=2021-10-21T18:07:48.464-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{TYPE I}" reply="ENC{200 Type set to I}"
level=INFO ts=2021-10-21T18:07:48.465-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MODE E}" reply="ENC{200 Will use Extended Block mode}"
level=INFO ts=2021-10-21T18:07:48.479-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.501-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //}" reply="ENC{550 Permission denied}"
level=INFO ts=2021-10-21T18:07:48.511-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/}" reply="ENC{550 Permission denied}"
level=INFO ts=2021-10-21T18:07:48.520-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/}" reply="ENC{550 Permission denied}"
level=INFO ts=2021-10-21T18:07:48.531-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/}" reply="ENC{550 Permission denied}"
level=INFO ts=2021-10-21T18:07:48.543-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/}" reply="ENC{550 Permission denied}"
level=INFO ts=2021-10-21T18:07:48.558-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.575-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.586-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.598-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.611-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.626-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.641-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.655-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{MKD //pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/}" reply="ENC{550 \"/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000\" directory already exists}"
level=INFO ts=2021-10-21T18:07:48.823-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{ALLO 936499002}" reply="ENC{200 ALLO command successful}"
level=INFO ts=2021-10-21T18:07:48.871-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{127 PORT (131,225,69,106,78,157)}"
level=INFO ts=2021-10-21T18:07:48.871-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{150 Opening BINARY data connection for /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00}"
level=INFO ts=2021-10-21T18:07:48.872-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:08:58.874-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:10:08.874-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:11:18.874-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:12:28.875-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:13:38.875-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:14:48.876-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:15:58.876-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:17:08.884-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:18:18.879-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:19:28.879-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:20:38.879-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:21:48.879-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{112-Perf Marker\r\n Timestamp: 1634857668.8\r\n Stripe Index: 0\r\n Stripe Bytes Transferred: 0\r\n Total Stripe Count: 1\r\n112 End.}"
level=INFO ts=2021-10-21T18:22:48.858-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain command="ENC{PUT path=//pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00;pasv;}" reply="ENC{451 Aborting transfer due to session termination}"
level=INFO ts=2021-10-21T18:22:48.858-0500 event=org.dcache.ftp.disconnect host.remote=131.225.13.45:53284 session=door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain
The source file has normal size:
[root@dcatest01 ~]# ls -alt /pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00
-rw-r--r-- 1 1841 1507 936499002 Nov 23 2010 /pnfs/fnal.gov/usr/d0/data/db5/derivedDetector/csg-p21.20.00-p20.18.02b/dzero/thumbnail/lep/0000/CSskim-NP-20101123-010613-6113518_p21.20.00
[root@dcatest01 ~]#
Pool side log excerpt for the above transfer:
21 Oct 2021 18:22:48 (rw-stkendca1904-2) [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain GFTP-stkendca58a-AAXO5PaVQ-A PoolAcceptFile 0000A44CDD5E64D44A4F8E0C109F3774431A] Transfer forcefully killed: Active transfer cancelled: killed by door: Aborting transfer due to session termination
21 Oct 2021 18:22:48 (rw-stkendca1904-2) [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain GFTP-stkendca58a-AAXO5PaVQ-A PoolAcceptFile 0000A44CDD5E64D44A4F8E0C109F3774431A] Failed to read file size: java.nio.file.NoSuchFileException: /storage/data2/rw-pool-2/data/0000A44CDD5E64D44A4F8E0C109F3774431A
(I think in this case actually the file did not end up with 0 size replica on pools, It existed in some "ether", cacheinfoof that file showed empty)
The extend of the problem. Overnight, looking got HSM flush errors:
[root@fndca2b ~]# for id in $ids; do ssh -p 24223 enstore@fndca3b "\sl $id rep ls $id" | grep "> 0 "; done
0000055B005B2A5E4ABF9056C59099B96EBD <-P---------L(0)[1]> 0 si={minerva.mc_minos}
0000139B8DE1FBA545CAA1F9C5C4DA586EB4 <-P---------L(0)[1]> 0 si={uboone.data_raw}
0000156CF3F4C56349AD802FFDEC317AD08E <-P---------L(0)[0]> 0 si={e1039.raw_coda}
00001EFF0D9341C34A5BA4655CE78A4F27B8 <-P---------L(0)[0]> 0 si={mu2e.usr-sim}
000026B4D53F3AB44849A64D4308A52BD53C <-P---------L(0)[0]> 0 si={nova.nova_users}
000026B4D53F3AB44849A64D4308A52BD53C <-P---------L(0)[0]> 0 si={nova.nova_users}
00002B4C5882AB184D22A31EB780B19AB113 <-P---------L(0)[0]> 0 si={nova.nova_production}
00003910C6A5BD90487FABE5AB3D57697066 <-P---------L(0)[0]> 0 si={GM2.GM2}
000073429EAB666A4EFCB738100013F64E32 <-P---------L(0)[0]> 0 si={egp.egp}
000074D0A861DEE243FC94E9DAE97B907396 <-P---------L(0)[0]> 0 si={nova.nova_production}
00008803C13EEF38402AB24B39874351D59F <-P---------L(0)[0]> 0 si={dune.dune}
00009D35E5ABADAE4B57A4446728C8CC2FEF <-P---------L(0)[0]> 0 si={minerva.mc_minos}
0000A692391CDD314652893316D00163D438 <-P---------L(0)[0]> 0 si={uboone.overlay_reco}
0000AFF01EEDF50447C99D3E6FFFE5906D15 <-P---------L(0)[1]> 0 si={uboone.data_raw}
0000BC9DCB0231C849EB9B1524CAA78630A8 <-P---------L(0)[0]> 0 si={nova.nova_users}
0000D9E1091D27294285B4E6FC0037BF7406 <-P---------L(0)[1]> 0 si={minerva.mc_minos}
0000E37309DFB46D4D76B0680FF86AE8069B <-P---------L(0)[0]> 0 si={nova.nova_users}
0000F5CA239F2C5C44F88F99A5CA7D238063 <-P---------L(0)[0]> 0 si={minerva.mc_generation}
Precious 0 size file that fail to go to tape (due to different problem).
Does the door's log file (the log file of the domain hosting the door) contain any entries regarding these failing transfers?
single line:
21 Oct 2021 18:22:48 (GFTP-stkendca58a-AAXO5PaVQ-A) [door:GFTP-stkendca58a-AAXO5PaVQ-A@gridftp-stkendca58aDomain] Transfer error: 451 Aborting transfer due to session termination
(this corresponds to the very first aborted transfer that I started this Issue with)
Two more questions:
Do you see any SRM upload activity (in particular that failed) on this dCache instance?
Do you see any xrootd uploads with the "Persist On Successful Close" (POSC) flag set (again, in particular that failed) ?
SRM activity is rather minimal. By looking in SRM access file I see:
41018 request.method=srmStatusOfBringOnlineRequest
29930 request.method=srmLs
20698 request.method=srmReleaseFiles
14746 request.method=srmPing
14527 request.method=srmPrepareToGet
696 request.method=srmBringOnline
660 request.method=srmAbortFiles
248 request.method=srmPutDone
248 request.method=srmPrepareToPut
233 request.method=srmRm
113 request.method=srmStatusOfGetRequest
3 request.method=srmStatusOfLsRequest
Actually I do not seem to see failing gets or puts. Fails I see have to do with SrmRm or SrmLs fails due to SRM_INVALID_PATH.
How do I know if POSC was set? ("grep -i 'pocs' xroot.access file did not bring up anything)
Do any of the srmAbortFiles
requests match the files with the problematic FTP transfers ... or do they happen at about the same time as the transfers?
Detecting xroot POSC isn't completely trivial. You're looking for a xroot kXR_open
request in the access log file with the options
flag 0x1000
set.
Here's a sed one-liner that pulls out the options flag from the access log file:
sed -n 's/.*xrootd.request.*request=open.*options=\([^ ]*\).*/\1/p' my-door.access
Here's some sample output from running the sed script:
0x468
0x1468
The first entry (0x468)
is the options for a kXR_open
request without POSC set.
The second entry (0x1468
) is the options for a kXR_open
request with POSC set.
SRM is rarely used. Mostly globus-url-copy.
Today I scanned all 0 lenth files accumulated so far:
psql -t -F ' ' -A -U enstore billing -c "select pnfsid from billinginfo where fullsize=0 and errorcode=666 and errormessage like 'Transfer forcefully killed: Active transfer cancelled: killed%' and datestamp> current_date" -o zero_size_files.txt
[root@fndca5b ~]# wc -l zero_size_files.txt
1168 zero_size_files.txt
out of 1168 only 5 actually exist in namespace and on pools. Luckily all on scratch (aka volatile pools).
Taking just one file ....
door access:
level=INFO ts=2021-10-26T03:17:09.690-0500 event=org.dcache.ftp.hello session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain socket.remote=131.225.208.106:58664 socket.local=131.225.69.172:2811 reply="220 GSI FTP door ready"
level=INFO ts=2021-10-26T03:17:09.848-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="AUTH GSSAPI" reply="334 ADAT must follow"
level=INFO ts=2021-10-26T03:17:10.316-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-26T03:17:12.161-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-26T03:17:12.509-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-26T03:17:13.616-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ADAT ..." reply="235 OK"
level=INFO ts=2021-10-26T03:17:31.553-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{USER :globus-mapping:}" user.dn="/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Jeremy Hewes/CN=UID:jhewes15" user.mapped=48604:9553,8674,8678,9767,9960,8844,8623,9553,9937,9010,8828 reply="ENC{200 User :globus-mapping: logged in}"
level=INFO ts=2021-10-26T03:17:31.864-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{PASS}" reply="ENC{200 PASS command successful}"
level=INFO ts=2021-10-26T03:17:32.036-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{SITE HELP}" reply="ENC{214-The following site-specific commands are supported:\r\nSITE <SP> HELP - Information about SITE commands\r\nSITE <SP> BUFSIZE <SP> <size> - Set network buffer to <size>\r\nSITE <SP> CHKSUM <SP> <value> - Fail upload if ADLER32 checksum isn't <value>\r\nSITE <SP> CHGRP <SP> <group> <SP> <path> - Change group-owner of <path> to group <group>\r\nSITE <SP> CHMOD <SP> <perm> <SP> <path> - Change permission of <path> to octal value <perm>\r\nSITE <SP> CLIENTINFO <SP> <id> - Provide server with information about the client\r\nSITE <SP> SYMLINK <SP> <dest> <SP> <path> - Register symlink at <path>, pointing to <dest>\r\nSITE <SP> SYMLINKFROM <SP> <path> - Register symlink location; SYMLINKTO must follow\r\nSITE <SP> SYMLINKTO <SP> <path> - Create symlink to <path>; SYMLINKFROM must be earlier command\r\nSITE <SP> TASKID <SP> <id> - Provide server with an identifier\r\nSITE <SP> USAGE <SP> [TOKEN <SP> <token> <SP> ] <path>\r\n214 SITE <SP> WHOAMI - Provides the username or uid of the user}"
level=INFO ts=2021-10-26T03:17:32.038-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{FEAT}" reply="ENC{211-OK\r\n EOF\r\n PARALLEL\r\n MODE-E-PERF\r\n SIZE\r\n SBUF\r\n ERET\r\n ESTO\r\n GETPUT\r\n MDTM\r\n CKSUM ADLER32,MD5,MD4,SHA-1,SHA-256,SHA-512\r\n MODEX\r\n TVFS\r\n MFMT\r\n MFCT\r\n MFF Modify;Create;UNIX.mode;\r\n PASV AllowDelayed\r\n MLSC\r\n MLST Size*;Modify*;Create;Type*;Unique*;Perm*;UNIX.owner*;UNIX.group*;UNIX.uid*;UNIX.gid*;UNIX.mode*;UNIX.ctime;UNIX.atime;\r\n211 End}"
level=INFO ts=2021-10-26T03:17:32.038-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{SITE CLIENTINFO scheme=gsiftp;appname=\"globus-url-copy\";appver=\"10.10 (gcc64, 1629915172-0) [Grid Community Toolkit 6.2]\";}" reply="ENC{250 OK}"
level=INFO ts=2021-10-26T03:17:32.038-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{TYPE I}" reply="ENC{200 Type set to I}"
level=INFO ts=2021-10-26T03:17:32.426-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca2043-AAXPPRqYMpA@gridftp-stkendca2043Domain command="ENC{ALLO 1757712}" reply="ENC{200 ALLO command successful}"
door log:
can't find anything for that string "
0
but the door crashed about the same time:
26 Oct 2021 03:18:13 (GFTP-stkendca2043-AAXPPRw20xg) [door:GFTP-stkendca2043-AAXPPRw20xg@gridftp-stkendca2043Domain] Login failed for CN=1569213987,CN=1121042410,CN=1773309327,CN=3869595295,CN=UID:bzli,CN=Bingzhi Li,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /home/enstore/dcache-log/gridftp-stkendca2043Domain-oom.hprof ...
Unable to create /home/enstore/dcache-log/gridftp-stkendca2043Domain-oom.hprof: File exists
Terminating due to java.lang.OutOfMemoryError: Java heap space
2021-10-26 03:18:22 Process terminated unexpectedly
2021-10-26 03:18:22 Sleeping 10 seconds
2021-10-26 03:18:32 Restarting /usr/bin/java -server -Xmx1024m -XX:MaxDirectMemorySize=1024m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Dorg.dcache.dcap.port=0 -Dorg.dcache.ftp.log-aborted-transfers=true -Dorg.dcache.net.tcp.portrange=25001:35000 -Djava.security.krb5.realm=FNAL.GOV -Djava.security.krb5.kdc=krb-fnal-enstore.fnal.gov:krb-fnal-2.fnal.gov:krb-fnal-3.fnal.gov:krb-fnal-4.fnal.gov:krb-fnal-5.fnal.gov -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss_host.conf -Dcontent.types.user.table=/etc/dcache/content-types.properties -Dzookeeper.sasl.client=false -Dcurator-dont-log-connection-problems=true -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/enstore/dcache-log/gridftp-stkendca2043Domain-oom.hprof -XX:+ExitOnOutOfMemoryError -XX:+StartAttachListener -XX:+UseCompressedOops -javaagent:/usr/share/dcache/classes/aspectjweaver-1.9.2.jar -Djava.awt.headless=true -DwantLog4jSetup=n -Djava.security.properties=/etc/dcache/max.java.security -Djdk.tls.ephemeralDHKeySize=2048 -Ddcache.home=/usr/share/dcache -Ddcache.paths.defaults=/usr/share/dcache/defaults org.dcache.boot.BootLoader start gridftp-stkendca2043Domain
WARN - dcache.conf:21: Property dcache.broker.host: please remove this assignment; See dcache.zookeeper.connection
The pool : no records.
GFTP crashing with OOM is something new. Happening after upgrade to 7.2 often
[gridftp-${host.name}Domain] dcache.java.memory.heap=1024m dcache.java.memory.direct=1024m [gridftp-${host.name}Domain/ftp] ftp.authn.protocol=gsi
1GB not enough?
Could look at the pool hosting one of these zero-length precious files. Does the pool's log file contain anything significant?
Could you pull out the billing log entries for the failed upload for the file with PNFS-ID 000026B4D53F3AB44849A64D4308A52BD53C
?
Also, could you confirm which pool(s) have a zero-length replica of this file?
what makes this file special?
In a previous comment, you ran rep ls
on all pools with problematic PNFS-IDs. This particular file appears to be stored on two pools.
This gives me a hint on what might be going on here.
I haven't figured out all the details, but I'd like to (try to) double-check my ideas, by looking at the billing entries for this PNFS-ID and which pools have a (zero-length) replica of this file.
Ah, I see. But I should not have. I scanned the pool log foe "HSM flush" errors. Some of these files are not GFTP files and some files are old. This particular file is actually NFS.
I will be more careful and filter on protocol and creation date.
I have a theory that these 0 length file may correlate with GFTP dooes running out memory.
I will keep posting here once I found out more.
In the migration program that invoked globus-url-copy I added:
if rc != 0: if os.path.exists(destination) : os.unlink(destination)
I will remove that to see if I start accumulating 0 file size again on fails.
As a suggestion: perhaps we should focus on one specific transfer that resulted in a precious zero-length file and try to understand what went wrong.
Currently, it seems like this issue is drifting and attracting many problems in the process, which might or might not be related.
Therefore, in order to make any progress, I think we should focus on one specific transfer. Any other issues (e.g., running out of memory) should be documented in separate issues.
I started with very specific GFTP transfer. You can ignore comment starting with "The extend of the problem. Overnight, looking got HSM flush errors:". I think that one caught unrelated cases.
psql -t -F ' ' -A -U enstore billing -c "select pnfsid from billinginfo where fullsize=0 and protocol in ('GFtp-2.0', 'GFtp-1.0') and errorcode=666 and isnew is true errormessage like 'Transfer forcefully killed: Active transfer cancelled: killed%' and datestamp> current_date" -o zero_size_files.txt
wc -l zero_size_files.txt
2843 zero_size_files.txt
out of these 2843 25 exist in namespace and all 25 have 0 size replica. none of these files on tape-backed pools. Lets look at one
00001B7C7918FA2848B89A684E4377310C65
p-ldrd-stkendca70a-21
billing
billing=# select * from doorinfo where pnfsid = '00001B7C7918FA2848B89A684E4377310C65';
client | mappedgid | mappeduid | owner | path | queuedtime | connectiontime | action | cellname | datestamp | errorcode | errormessage | pnfsid | transaction | fqan
--------+-----------+-----------+-------+------+------------+----------------+--------+----------+-----------+-----------+--------------+--------+-------------+------
(0 rows)
billing=# select datestamp, errorcode, errormessage, connectiontime, initiator from billinginfo where pnfsid = '00001B7C7918FA2848B89A684E4377310C65';
datestamp | errorcode | errormessage | connectiontime | initiator
----------------------------+-----------+----------------------------------------------------------------------+----------------+------------------------------------------------------------------------------
2021-10-27 01:08:27.526-05 | 666 | Transfer forcefully killed: Active transfer cancelled: killed by JTM | 0 | door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain:1635311284924000
(1 row)
door access:
[root@stkendca71a ~]# grep 'GFTP-stkendca71a-AAXPTpQyz_A' ~enstore/dcache-log/gridftp-stkendca71aDomain.access
level=INFO ts=2021-10-27T00:08:04.310-0500 event=org.dcache.ftp.hello session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain socket.remote=128.230.11.9:62143 socket.local=131.225.69.79:2811 reply="220 GSI FTP door ready"
level=INFO ts=2021-10-27T00:08:04.338-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="AUTH GSSAPI" reply="334 ADAT must follow"
level=INFO ts=2021-10-27T00:08:04.373-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-27T00:08:04.421-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-27T00:08:04.452-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-10-27T00:08:04.487-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ADAT ..." reply="235 OK"
level=INFO ts=2021-10-27T00:08:04.537-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{USER :globus-mapping:}" user.dn="/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Jacob Zettlemoyer/CN=UID:jzettle" user.mapped=56911:8644,9314,8644,9767,8844,8623,9429,5468,8636,9470 reply="ENC{200 User :globus-mapping: logged in}"
level=INFO ts=2021-10-27T00:08:04.566-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{PASS}" reply="ENC{200 PASS command successful}"
level=INFO ts=2021-10-27T00:08:04.595-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{SITE HELP}" reply="ENC{214-The following site-specific commands are supported:\r\nSITE <SP> HELP - Information about SITE commands\r\nSITE <SP> BUFSIZE <SP> <size> - Set network buffer to <size>\r\nSITE <SP> CHKSUM <SP> <value> - Fail upload if ADLER32 checksum isn't <value>\r\nSITE <SP> CHGRP <SP> <group> <SP> <path> - Change group-owner of <path> to group <group>\r\nSITE <SP> CHMOD <SP> <perm> <SP> <path> - Change permission of <path> to octal value <perm>\r\nSITE <SP> CLIENTINFO <SP> <id> - Provide server with information about the client\r\nSITE <SP> SYMLINK <SP> <dest> <SP> <path> - Register symlink at <path>, pointing to <dest>\r\nSITE <SP> SYMLINKFROM <SP> <path> - Register symlink location; SYMLINKTO must follow\r\nSITE <SP> SYMLINKTO <SP> <path> - Create symlink to <path>; SYMLINKFROM must be earlier command\r\nSITE <SP> TASKID <SP> <id> - Provide server with an identifier\r\nSITE <SP> USAGE <SP> [TOKEN <SP> <token> <SP> ] <path>\r\n214 SITE <SP> WHOAMI - Provides the username or uid of the user}"
level=INFO ts=2021-10-27T00:08:04.625-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{FEAT}" reply="ENC{211-OK\r\n EOF\r\n PARALLEL\r\n MODE-E-PERF\r\n SIZE\r\n SBUF\r\n ERET\r\n ESTO\r\n GETPUT\r\n MDTM\r\n CKSUM ADLER32,MD5,MD4,SHA-1,SHA-256,SHA-512\r\n MODEX\r\n TVFS\r\n MFMT\r\n MFCT\r\n MFF Modify;Create;UNIX.mode;\r\n PASV AllowDelayed\r\n MLSC\r\n MLST Size*;Modify*;Create;Type*;Unique*;Perm*;UNIX.owner*;UNIX.group*;UNIX.uid*;UNIX.gid*;UNIX.mode*;UNIX.ctime;UNIX.atime;\r\n211 End}"
level=INFO ts=2021-10-27T00:08:04.654-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{SITE CLIENTINFO scheme=gsiftp;appname=\"globus-url-copy\";appver=\"10.10 (gcc64, 1629915172-0) [Grid Community Toolkit 6.2]\";}" reply="ENC{250 OK}"
level=INFO ts=2021-10-27T00:08:04.866-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{TYPE I}" reply="ENC{200 Type set to I}"
level=INFO ts=2021-10-27T00:08:04.894-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{ALLO 2540543}" reply="ENC{200 ALLO command successful}"
level=INFO ts=2021-10-27T00:08:04.968-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{PUT path=/pnfs/fnal.gov/usr/ldrd/darksectorldrd/persistent/users/jzettle/darksectorsim/DM_815//darksectorsim_run_124.root;pasv;}" reply="ENC{127 PORT (131,225,69,78,85,215)}"
level=INFO ts=2021-10-27T00:08:04.968-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain command="ENC{PUT path=/pnfs/fnal.gov/usr/ldrd/darksectorldrd/persistent/users/jzettle/darksectorsim/DM_815//darksectorsim_run_124.root;pasv;}" reply="ENC{150 Opening BINARY data connection for /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/ldrd/darksectorldrd/persistent/users/jzettle/darksectorsim/DM_815/darksectorsim_run_124.root}"
door log:
Nothing....
BUT... door failed 40 minutes later:
27 Oct 2021 00:49:51 (GFTP-stkendca71a-AAXPTydRWVA) [door:GFTP-stkendca71a-AAXPTydRWVA@gridftp-stkendca71aDomain] Login failed for CN=1441949455,CN=404089824,CN=131724431,CN=3450858237,CN=UID:yinrui,CN=Yinrui Liu,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:52 (GFTP-stkendca71a-AAXPTydW06g) [door:GFTP-stkendca71a-AAXPTydW06g@gridftp-stkendca71aDomain] Login failed for CN=665513507,CN=1834188273,CN=1539352787,CN=943063367,CN=uboonepro-uboonegpvm01.fnal.gov,OU=Fermilab,O=Fermi Research Alliance,L=Batavia,ST=Illinois,C=US,DC=incommon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:53 (GFTP-stkendca71a-AAXPTydhpTA) [door:GFTP-stkendca71a-AAXPTydhpTA@gridftp-stkendca71aDomain] Login failed for CN=1047975828,CN=1783680616,CN=1296741990,CN=1499981063,CN=UID:reitzner,CN=Diane Reitzner,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:57 (GFTP-stkendca71a-AAXPTyfi_ng) [door:GFTP-stkendca71a-AAXPTyfi_ng@gridftp-stkendca71aDomain] Login failed for CN=2020185082,CN=404089824,CN=131724431,CN=3450858237,CN=UID:yinrui,CN=Yinrui Liu,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:55 (GFTP-stkendca71a-AAXPTyfGGAg) [door:GFTP-stkendca71a-AAXPTyfGGAg@gridftp-stkendca71aDomain] Login failed for CN=114956643,CN=404089824,CN=131724431,CN=3450858237,CN=UID:yinrui,CN=Yinrui Liu,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:55 (GFTP-stkendca71a-AAXPTyfGGAk) [door:GFTP-stkendca71a-AAXPTyfGGAk@gridftp-stkendca71aDomain] Login failed for CN=599196195,CN=404089824,CN=131724431,CN=3450858237,CN=UID:yinrui,CN=Yinrui Liu,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:55 (GFTP-stkendca71a-AAXPTyfGGAo) [door:GFTP-stkendca71a-AAXPTyfGGAo@gridftp-stkendca71aDomain] Login failed for CN=422559779,CN=2032225097,CN=849638223,CN=1499981063,CN=UID:reitzner,CN=Diane Reitzner,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:49:59 (GFTP-stkendca71a-AAXPTyfXD8g) [door:GFTP-stkendca71a-AAXPTyfXD8g@gridftp-stkendca71aDomain] Login failed for CN=590566002,CN=404089824,CN=131724431,CN=3450858237,CN=UID:yinrui,CN=Yinrui Liu,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
27 Oct 2021 00:50:04 (GFTP-stkendca71a-AAXPTygozZA) [door:GFTP-stkendca71a-AAXPTygozZA@gridftp-stkendca71aDomain] Login failed for CN=1170340006,CN=1546171595,CN=1359815235,CN=770238401,CN=UID:jzettle,CN=Jacob Zettlemoyer,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org: Request to [>gPlazma@local] timed out.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /home/enstore/dcache-log/gridftp-stkendca71aDomain-oom.hprof ...
Heap dump file created [1402871611 bytes in 15.171 secs]
Terminating due to java.lang.OutOfMemoryError: Java heap space
2021-10-27 00:50:31 Process terminated unexpectedly
2021-10-27 00:50:31 Sleeping 10 seconds
pool log :
27 Oct 2021 01:08:27 (p-ldrd-stkendca70a-21) [door:GFTP-stkendca71a-AAXPTpQyz_A@gridftp-stkendca71aDomain GFTP-stkendca71a-AAXPTpQyz_A PoolAcceptFile 00001B7C7918FA2848B89A684E4377310C65] Transfer forcefully killed: Active transfer cancelled: killed by JTM
so, I will try to gather some stat. But does it look like this is correlated to whatever events lead up to door running out of memory?
before dying GFTP door spat this:
29 Oct 2021 13:27:32 (GFTP-stkendca58a-AAXPgLDNbpA) [door:GFTP-stkendca58a-AAXPgLDNbpA@gridftp-stkendca58aDomain] Transfer error: 451 Failed to deliver PnfsGetFileAttributes message <-527409
367631836793:-7271213560554978864> to [>PnfsManager@local]: c-dCacheDomain-AAXPfxnKo2A-AAXPfxnKswA@gridftp-stkendca58aDomain is busy (its estimated response time of 310654 ms is longer than
the message TTL of 300000 ms).
29 Oct 2021 13:29:06 (GFTP-stkendca58a-AAXPgbYq_jg) [door:GFTP-stkendca58a-AAXPgbYq_jg@gridftp-stkendca58aDomain] Failed to delete file after failed upload: /pnfs/fnal.gov/usr/pnfs/fnal.gov/
usr/GM2/scratch/users/gm2pro/preprod/QDatabase/5302P/Run4U-v9_70/2021-10-29-12-05-45/json/gm2offline_ana_49742599_43996.00021.json (0000D590F73856A34C3493CD52C488729C86): Request to [>PnfsMa
nager@local] timed out.
29 Oct 2021 13:29:53 (GFTP-stkendca58a-AAXPgbYq_jg) [door:GFTP-stkendca58a-AAXPgbYq_jg@gridftp-stkendca58aDomain] Transfer error: 451 Request to [>v-stkendca2012-2@v-stkendca2012-2Domain] ti
med out.
29 Oct 2021 13:30:31 (GFTP-stkendca58a-AAXPf-ptGYA) [door:GFTP-stkendca58a-AAXPf-ptGYA@gridftp-stkendca58aDomain v-stkendca2011-2 DoorTransferFinished 00002E266C62F2144E34A9631F80FBF72A06] F
ailed to delete file after failed upload: /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/GM2/scratch/daq/2021-10-29-10-06-13/data/gm2preproduction_full_25360075_44088.00302.root (00002E266C62F2144E34A
9631F80FBF72A06): Request to [>PnfsManager@local] timed out.
Guess this is how the zero size file get left behind.
Ha, the file above not 0 size:
[root@stkendca58a ~]# ls -al /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/GM2/scratch/daq/2021-10-29-10-06-13/data/gm2preproduction_full_25360075_44088.00302.root
-rw-rw-r-- 1 45651 9950 23779020 Oct 29 13:11 /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/GM2/scratch/daq/2021-10-29-10-06-13/data/gm2preproduction_full_25360075_44088.00302.root
[root@stkendca58a ~]# cat /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/GM2/scratch/daq/2021-10-29-10-06-13/data/".(id)(gm2preproduction_full_25360075_44088.00302.root)"
00002E266C62F2144E34A9631F80FBF72A06
Billing:
billing=# select datestamp, protocol, isnew, errorcode, errormessage from billinginfo where pnfsid='00002E266C62F2144E34A9631F80FBF72A06';
datestamp | protocol | isnew | errorcode | errormessage
----------------------------+----------+-------+-----------+---------------------------------------------------------------------------------------
2021-10-29 13:25:18.929-05 | NFS4-4.1 | f | 0 |
2021-10-29 13:11:56.059-05 | GFtp-2.0 | t | 666 | General problem: Problem while connected to 137.99.174.15:41558: Connection timed out
(2 rows)
We may be having incomplete, and thus corrupted files left behind!
This issue could be the effect of doors crashing with OOM. Issue #6195 Will need to verify.
dCache 7.2
Aborted GFTP uploads leave 0 length files behind. Additionally these files are "P"recious on pools and perpetually tried to be written to tape:
176 seconds, data not moved: