dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

Pool replicas with Storage Class si={<Unknown>:<Unknown>} #6796

Open cfgamboa opened 1 year ago

cfgamboa commented 1 year ago

Dear all,

We have observed that via some of files on DISK have been marked as storage si={<Unknown>:<Unknown>}

For example

dc211_2:
    0000EF4CF609D39F4066958D3FCD6ADC4507 <C-------X--L(0)[0]> 27895219 si={<Unknown>:<Unknown>}
dc258_9:
    0000EF4CF609D39F4066958D3FCD6ADC4507 <C-------X--L(0)[0]> 27895219 si={bnlt0d1:BNLT0D1} 

Does anybody had reported similar issue? This does not appear to be isolated to a pool but rather scattered.

We are using dCache 7.2

kofemann commented 1 year ago

Hi Carlos,

this is default value if no storage is configured in the target directory (not OSMTemplate and no sGroup tags)

cfgamboa commented 1 year ago

For some reason the repository at the pool do not get the information written for the target directory. The replica does have the right TAGs

kofemann commented 1 year ago

Hi Carlos, can you check the billing logs to find matching transfers to those files?

cfgamboa commented 1 year ago

Hello @kofemann

There are two examples here 1. for 0000EF4CF609D39F4066958D3FCD6ADC4507 file

The billing records do not trace the change on the storage class in the pool

File stored 
02.01 09:43:30 [door:GFTP-dcdoor18-186subnet-AAW6R2CULCg@gridftp-dcdoor18Domain:request] ["/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1":6435:31152:130.199.159.106] [0000EF4CF609D39F4066958D3FCD6ADC4507,27895219] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc16_13TeV/48/fc/log.23744689._008438.job.log.tgz.1.rucio.upload] bnlt0d1:BNLT0D1@osm 743 0 {0:""}

02.01 09:43:30 [pool:dc051_10:transfer] [0000EF4CF609D39F4066958D3FCD6ADC4507,27895219] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc16_13TeV/48/fc/log.23744689._008438.job.log.tgz.1.rucio.upload] bnlt0d1:BNLT0D1@osm 27895219 246 true {GFtp-2.0 10.42.38.51 38085} [door:GFTP-dcdoor18-186subnet-AAW6R2CULCg@gridftp-dcdoor18Domain:1612190609419000] {0:""}

**File replicated to other pool**

02.01 09:43:30 [pool:dc051_10:transfer] [0000EF4CF609D39F4066958D3FCD6ADC4507,27895219] [Unknown] bnlt0d1:BNLT0D1@osm 27895219 212 false {Http-1.1:10.42.38.94:0:dc211_2:dc211twoDomain:/0000EF4CF609D39F4066958D3FCD6ADC4507} [pool:dc211_2@dc211twoDomain] {0:""}

At the pool

[dcadmin02] (dc211_2@dc211twoDomain) admin > rep ls -l 0000EF4CF609D39F4066958D3FCD6ADC4507
0000EF4CF609D39F4066958D3FCD6ADC4507 <C-------X--L(0)[0]> 27895219 si={<Unknown>:<Unknown>}

[root@dc211 data]# stat 0000EF4CF609D39F4066958D3FCD6ADC4507
  File: ‘0000EF4CF609D39F4066958D3FCD6ADC4507’
  Size: 27895219    Blocks: 54488      IO Block: 4096   regular file
Device: 901h/2305d  Inode: 17661544453  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-02-01 09:43:30.497048240 -0500
Modify: 2021-02-01 09:43:30.402044789 -0500
Change: 2021-02-01 09:43:30.402044789 -0500

2. This file 0000FF4DE24A73B14901A1F230DAF437CD1E had only one replica

File transfer

02.24 20:27:53 [door:GFTP-dcdoor11-15ipv6subnet-AAW8Hw1A2Pg@gridftp-dcdoor11Domain:request] ["/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management":6435:31152:2620:0:210:1:0:0:0:1b] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [/pnfs/usatlas.bnl.gov/LOCALGROUPDISK/rucio/user/jhaley/dd/74/user.jhaley.24112049._000021.output_Loose_v1_MC16e_PFlow.root] GROUPDISK:LOCAL@osm 37790 0 {0:""}
02.24 20:27:53 [pool:dc046_3:transfer] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [/pnfs/usatlas.bnl.gov/LOCALGROUPDISK/rucio/user/jhaley/dd/74/user.jhaley.24112049._000021.output_Loose_v1_MC16e_PFlow.root] GROUPDISK:LOCAL@osm 1000233694 35419 true {GFtp-2.0 10.42.38.64 34280} [door:GFTP-dcdoor11-15ipv6subnet-AAW8Hw1A2Pg@gridftp-dcdoor11Domain:1614216435981000] {0:""}

File transfer to other pool

02.24 20:27:57 [pool:dc046_3:transfer] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [Unknown] GROUPDISK:LOCAL@osm 1000233694 3457 false {Http-1.1:10.42.38.111:0:dc225_3:dc225threeDomain:/0000FF4DE24A73B14901A1F230DAF437CD1E} [pool:dc225_3@dc225threeDomain] {0:""}

First record of the file with :@osm class

03.28 00:29:19 [pool:dc225_3:transfer] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [/pnfs/usatlas.bnl.gov/LOCALGROUPDISK/rucio/user/jhaley/dd/74/user.jhaley.24112049._000021.output_Loose_v1_MC16e_PFlow.root] <Unknown>:<Unknown>@osm 52562 725 false {Xrootd-5.0:10.42.38.67:58280} [door:Xrootd2-dcdoor18@xrootd2-dcdoor18Domain:AAW+kTRH9mA:1616905758376000] {0:""}

To fix the storage class the file is manually replicated to another pool, this seems to set the right Storage Class

09.27 10:09:03 [pool:dc225_3:transfer] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [Unknown] <Unknown>:<Unknown>@osm 1000233694 5664 false {Http-1.1:10.42.64.32:0:dc255_1:dc255oneDomain:/0000FF4DE24A73B14901A1F230DAF437CD1E} [pool:dc255_1@dc255oneDomain] {0:""}

File with wrong class is later removed

09.27 10:51:32 [pool:dc225_3@dc225threeDomain:remove] [0000FF4DE24A73B14901A1F230DAF437CD1E,1000233694] [Unknown] <Unknown>:<Unknown>@osm {0:"'rep rm' command"}

Carlos

cfgamboa commented 1 year ago

Is there any tool command that allows set the correspondent storage class

cfgamboa commented 1 year ago

For example this file:

[dcadmin02] (dc255_1@dc255oneDomain) admin > \sl 000040178281650A426FA54E545CC05F4E96 rep ls 000040178281650A426FA54E545CC05F4E96
dc221_16:
    000040178281650A426FA54E545CC05F4E96 <C-------X--L(0)[0]> 20278934 si={bnlt0d1:BNLT0D1}
dc252_4:
    000040178281650A426FA54E545CC05F4E96 <C----------L(0)[0]> 20278934 si={<Unknown>:<Unknown>}

Replicating the file with si={:} via pp get file

[dcadmin02] (dc255_1@dc255oneDomain) admin > \c dc252_1
[dcadmin02] (dc252_1@dc252oneDomain) admin > pp get file 000040178281650A426FA54E545CC05F4E96 dc252_4
Transfer Initiated

File SI is now displaying the si={bnlt0d1:BNLT0D1}

[dcadmin02] (dc252_1@dc252oneDomain) admin > rep ls -l 000040178281650A426FA54E545CC05F4E96
000040178281650A426FA54E545CC05F4E96 <C-------X--L(0)[0]> 20278934 si={bnlt0d1:BNLT0D1}

[dcadmin02] (dc252_1@dc252oneDomain) admin > \sn cacheinfoof 000040178281650A426FA54E545CC05F4E96
 dc221_16 dc252_1 dc252_4

Migration move does not appear to fix the issue

[dcadmin02] (dc252_4@dc252fourDomain) admin > migration move -pnfsid=000040178281650A426FA54E545CC05F4E96 -target=pool dc251_1 -verify
[1] INITIALIZING migration move -pnfsid=000040178281650A426FA54E545CC05F4E96 -target=pool -verify -- dc251_1

File gets copied but the SI is stilling the one with :

[dcadmin02] (dc252_4@dc252fourDomain) admin > \s dc251_1 rep ls -l 000040178281650A426FA54E545CC05F4E96
000040178281650A426FA54E545CC05F4E96 <C-------X--L(0)[0]> 20278934 si={<Unknown>:<Unknown>}

Carlos

kofemann commented 1 year ago

Hi @cfgamboa ,

This means, that you have a directory without storage tags. The easiest way to find it out is to check the billing info for the given pnfsid.

cfgamboa commented 1 year ago

Hello,

Thank you, the directory has a tag. The source file had the right one, it seems that the replica did get that attribute from the original one.

All the best, Carlos

On Jul 12, 2023, at 4:46 PM, Tiramisu Mokka @.***> wrote:

Hi @cfgamboa https://github.com/cfgamboa ,

This means, that you have a directory without storage tags. The easiest way to find it out is to check the billing info for the given pnfsid.

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6796#issuecomment-1633193332, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO3XM6XQIG2KPFJ6PZ3XP4EK5ANCNFSM6AAAAAAQW6UJZE. You are receiving this because you were mentioned.

kofemann commented 1 year ago

Do you have corresponding billing records?

cfgamboa commented 1 year ago

Yes, for the first example I have added the billing records. I could try to find more if you need that.

All the best, Carlos

On Jul 12, 2023, at 4:56 PM, Tiramisu Mokka @.***> wrote:

Do you have corresponding billing records?

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6796#issuecomment-1633204130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMOYGC64FQ5TJPAWKIWTXP4FRRANCNFSM6AAAAAAQW6UJZE. You are receiving this because you were mentioned.