DUNE / data-mgmt-ops

3 stars 3 forks source link

FNAL reports bad files: #665

Closed StevenCTimm closed 2 months ago

StevenCTimm commented 3 months ago

Hi Ken, Mike, Heidi or Steve,

Could you please have a look when you get some time? These 18 dune files

-rw-r--r-- 1 50762 9010 865796096 Jun 24 13:15 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/08/np04hd_raw_run027408_0026_dataflow1_datawriter_0_20240624T155016.hdf5 -rw-r--r-- 1 50762 9010 761462784 Jun 23 20:16 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/73/94/np04hd_raw_run027394_0761_dataflow1_datawriter_0_20240623T225242.hdf5 -rw-r--r-- 1 50762 9010 110231552 Jun 24 09:24 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0213_dataflow3_datawriter_0_20240624T120313.hdf5 -rw-r--r-- 1 50762 9010 204668928 Jun 24 10:18 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0324_dataflow3_datawriter_0_20240624T125811.hdf5 -rw-r--r-- 1 50762 9010 1239416832 Jun 24 13:55 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0071_dataflow3_datawriter_0_20240624T140341.hdf5_1719247206 -rw-r--r-- 1 50762 9010 195100672 Jun 24 10:32 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0355_dataflow0_datawriter_0_20240624T131320.hdf5 -rw-r--r-- 1 50762 9010 263258112 Jun 24 11:15 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0016_dataflow1_datawriter_0_20240624T134724.hdf5 -rw-r--r-- 1 50762 9010 870842368 Jun 23 19:44 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/73/94/np04hd_raw_run027394_0700_dataflow1_datawriter_0_20240623T222353.hdf5 rw-r--r-- 1 50762 9010 331087872 Jun 24 11:25 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0071_dataflow3_datawriter_0_20240624T140341.hdf5 -rw-r--r-- 1 50762 9010 865796096 Jun 24 13:15 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/08/np04hd_raw_run027408_0026_dataflow1_datawriter_0_20240624T155016.hdf5 -rw-r--r-- 1 50762 9010 761462784 Jun 23 20:16 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/73/94/np04hd_raw_run027394_0761_dataflow1_datawriter_0_20240623T225242.hdf5 -rw-r--r-- 1 50762 9010 110231552 Jun 24 09:24 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0213_dataflow3_datawriter_0_20240624T120313.hdf5 -rw-r--r-- 1 50762 9010 204668928 Jun 24 10:18 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0324_dataflow3_datawriter_0_20240624T125811.hdf5 -rw-r--r-- 1 50762 9010 1239416832 Jun 24 13:55 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0071_dataflow3_datawriter_0_20240624T140341.hdf5_1719247206 -rw-r--r-- 1 50762 9010 195100672 Jun 24 10:32 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/04/np04hd_raw_run027404_0355_dataflow0_datawriter_0_20240624T131320.hdf5 -rw-r--r-- 1 50762 9010 263258112 Jun 24 11:15 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0016_dataflow1_datawriter_0_20240624T134724.hdf5 -rw-r--r-- 1 50762 9010 870842368 Jun 23 19:44 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/73/94/np04hd_raw_run027394_0700_dataflow1_datawriter_0_20240623T222353.hdf5 -rw-r--r-- 1 50762 9010 331087872 Jun 24 11:25 /pnfs/fs/usr/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/cosmics/None/00/02/74/06/np04hd_raw_run027406_0071_dataflow3_datawriter_0_20240624T140341.hdf5 were not uploaded properly to dCache with error messages like this: 2001:1458:301:c7:0:0:100:13 | door:WebDAV-fndca4b-1@webdavDomain:AAYbpnVNduA:1719251606856000 | t | Https-1.1 | 865796096 | 865796096 | dune.protodune-hd-2024 -rawdata-physics@enstore | 1327390 | transfer | rw-protodune-stkendca2001-4@rw-protodune-stkendca2001-4Domain | 2024-06-24 13:15:34.373-05 | 10004 | Connec tion lost before end of file. | 0000FEF34B0B73464CAD8FC66AF8D153517E | pool:rw-protodune-stkendca2001-4@rw-protodune-stkendca2001-4Domain:1719252934373-1417 | f |
| 50762 | 9010 | dunepro

You may want to delete the copy in dCache and retransfer them if you have a copy somewhere else.

Thanks in advance for your help on this.

Regards, Yujun

StevenCTimm commented 3 months ago

So it appears that Rucio already automatically detected those failures and already copied the files to tape again, with a timestamp appended as its custom.

ls -lrt /pnfs/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/08/np04hd_raw_run027408_0026_dataflow1_datawriter_0_20240624T155016.hdf5* -rw-r--r-- 1 dunepro dune 865796096 Jun 24 13:15 /pnfs/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/08/np04hd_raw_run027408_0026_dataflow1_datawriter_0_20240624T155016.hdf5 -rw-r--r-- 1 dunepro dune 4243602368 Jun 24 13:30 /pnfs/dune/tape_backed/dunepro/hd-protodune/raw/2024/detector/physics/None/00/02/74/08/np04hd_raw_run027408_0026_dataflow1_datawriter_0_20240624T155016.hdf5_1719253795

We'll go through and remove the smaller aborted files but we have good copies of all of those files already on tape.

yuyiguo commented 2 months ago

The failed files were all removed.

StevenCTimm commented 2 months ago

Thanks Yuyi.