dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
284 stars 135 forks source link

{192:"Failed to select pool: PANIC : File not present in any reasonable pool"} #7137

Open cfgamboa opened 1 year ago

cfgamboa commented 1 year ago

This transfer is reported as failed with {192:"Failed to select pool: PANIC : File not present in any reasonable pool"} however, the files are actually stored.

FTS transfer log https://fts306.usatlas.bnl.gov:8449/var/log/fts3/transfers/2023-05-01/dcsrm.usatlas.bnl.gov__dcgftp.usatlas.bnl.gov/2023-05-01-0741__dcsrm.usatlas.bnl.gov__dcgftp.usatlas.bnl.gov__1708477775__e749228e-e7d3-11ed-8c96-00163e1051a0

Source

[root@dcmaint02 05]# grep 0000DDC38CAA6D7C4E869C6EC3D4FBA8A0EF billing-2023.05.01 
04.30 23:55:20 [pool:dc223_38@dc223thirtyeightDomain:restore] [0000DDC38CAA6D7C4E869C6EC3D4FBA8A0EF,8785125592] [Unknown] MCTAPE:MC@osm 13173810 0 {0:""}
05.01 03:41:31 [door:RemoteTransferManager@srm-dcsrm03Domain:request] ["usatlas1":6435:31152:unknown] [0000DDC38CAA6D7C4E869C6EC3D4FBA8A0EF,0] [/pnfs/usatlas.bnl.gov/MCTAPE/mc20_13TeV/AOD/e8382_e7400_s3681_r13145_r13146/mc20_13TeV.509756.MGPy8EG_FxFx_Wtaunu_H_3jets_HT2bias_CVetoBVeto.merge.AOD.e8382_e7400_s3681_r13145_r13146_tid28864194_00/AOD.28864194._000061.pool.root.1] MCTAPE:MC@osm 12 12 {192:"Failed to select pool: PANIC : File not present in any reasonable pool"}
05.01 05:20:08 [pool:dc223_38:transfer] [0000DDC38CAA6D7C4E869C6EC3D4FBA8A0EF,8785125592] [/pnfs/usatlas.bnl.gov/MCTAPE/mc20_13TeV/AOD/e8382_e7400_s3681_r13145_r13146/mc20_13TeV.509756.MGPy8EG_FxFx_Wtaunu_H_3jets_HT2bias_CVetoBVeto.merge.AOD.e8382_e7400_s3681_r13145_r13146_tid28864194_00/AOD.28864194._000061.pool.root.1] MCTAPE:MC@osm 8785125592 143656 false {Http-1.1:10.42.38.57:0:WebDAV2-dcdoor20-internalipv6:webdav2-dcdoor20_httpsDomain:/pnfs/usatlas.bnl.gov/MCTAPE/mc20_13TeV/AOD/e8382_e7400_s3681_r13145_r13146/mc20_13TeV.509756.MGPy8EG_FxFx_Wtaunu_H_3jets_HT2bias_CVetoBVeto.merge.AOD.e8382_e7400_s3681_r13145_r13146_tid28864194_00/AOD.28864194._000061.pool.root.1} [door:WebDAV2-dcdoor20-internalipv6@webdav2-dcdoor20_httpsDomain:AAX6nkuts8g:1682932665012000] {0:""}
05.01 05:20:08 [door:WebDAV2-dcdoor20-internalipv6@webdav2-dcdoor20_httpsDomain:request] ["usatlas1":6435:31152:2620:0:210:8803:0:0:0:60] [0000DDC38CAA6D7C4E869C6EC3D4FBA8A0EF,8785125592] [/pnfs/usatlas.bnl.gov/MCTAPE/mc20_13TeV/AOD/e8382_e7400_s3681_r13145_r13146/mc20_13TeV.509756.MGPy8EG_FxFx_Wtaunu_H_3jets_HT2bias_CVetoBVeto.merge.AOD.e8382_e7400_s3681_r13145_r13146_tid28864194_00/AOD.28864194._000061.pool.root.1] MCTAPE:MC@osm 143684 0 {0:""}

Destination

 grep 00001959F92CD71F44ED84C01F6849E8533E billing-2023.05.01 
05.01 05:20:10 [pool:dcdoor05_1:transfer] [00001959F92CD71F44ED84C01F6849E8533E,8785125592] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc20_13TeV/3a/96/AOD.28864194._000061.pool.root.1] bnlt0d1:BNLT0D1@osm 8785125592 143734 true {RemoteHttpsDataTransfer-1.1:https://dcdoor20.usatlas.bnl.gov:443/pnfs/usatlas.bnl.gov/MCTAPE/mc20_13TeV/AOD/e8382_e7400_s3681_r13145_r13146/mc20_13TeV.509756.MGPy8EG_FxFx_Wtaunu_H_3jets_HT2bias_CVetoBVeto.merge.AOD.e8382_e7400_s3681_r13145_r13146_tid28864194_00/AOD.28864194._000061.pool.root.1} [door:RemoteTransferManager@srm-dcsrm03Domain:1682932664965-9232667] {0:""}
05.01 05:20:45 [pool:dcdoor05_1:transfer] [00001959F92CD71F44ED84C01F6849E8533E,8785125592] [Unknown] bnlt0d1:BNLT0D1@osm 8785125592 35128 false {Http-1.1:2620:0:210:8803:0:0:0:97:0:dc214_1:dc214oneDomain:/00001959F92CD71F44ED84C01F6849E8533E} [pool:dc214_1@dc214oneDomain] {0:""}
05.01 05:20:54 [pool:dcdoor05_1@dcdoor05oneDomain:remove] [00001959F92CD71F44ED84C01F6849E8533E,8785125592] [Unknown] bnlt0d1:BNLT0D1@osm {0:"migration job deleting source"}
05.01 07:10:00 [pool:dc214_1:transfer] [00001959F92CD71F44ED84C01F6849E8533E,8785125592] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc20_13TeV/3a/96/AOD.28864194._000061.pool.root.1] bnlt0d1:BNLT0D1@osm 8785125592 200294 false {Http-1.1:10.42.38.61:0:WebDAV2-dcdoor06-external:webdav2-dcdoor06_httpsDomain:/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc20_13TeV/3a/96/AOD.28864194._000061.pool.root.1} [door:WebDAV2-dcdoor06-external@webdav2-dcdoor06_httpsDomain:AAX6n9E1F9g:1682939200224000] {0:""}
05.01 07:10:00 [door:WebDAV2-dcdoor06-external@webdav2-dcdoor06_httpsDomain:request] ["usatlas1":6435:31152:81.180.86.48] [00001959F92CD71F44ED84C01F6849E8533E,8785125592] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc20_13TeV/3a/96/AOD.28864194._000061.pool.root.1] bnlt0d1:BNLT0D1@osm 200410 0 {0:""}
cfgamboa commented 1 year ago

The destination file is copied. So why the error in this case?

paulmillar commented 1 year ago

At the very least, this indicates that the error message is insufficient to understand what went wrong.

There may be some underlying issue triggering this problem, but without being able to understand why pool-manager failed to identify a suitable pool it is impossible to understand why the transfer failed.

cfgamboa commented 1 year ago

At the very least, this indicates that the error message is insufficient to understand what went wrong.

There may be some underlying issue triggering this problem, but without being able to understand why pool-manager failed to identify a suitable pool it is impossible to understand why the transfer failed.

It was observed that at time of error the DMZ space used for the TPC transfer was under constrain.