Open XMol opened 2 years ago
Hi Xavier,
your observations are correct. So, here is the flow:
For stored files t_storageinfo
directory is used. This is for the case, to be able to restore files from tapes if storage info is changes in the directory, let say, you want to use a different tape library, media and so on.
For files on disk, directory tags are used. Any change of tags affect only newly written files. Existing files stay as-is, even if they are disk-only files.
dCache pools are not aware of any namespace changes. They only handle existing files. Thus, changes of directory tags don't create any flush queues, trigger data movement and so.
This concept, allows users to create weir situations, when files are allowed to be read/written to a single pool, but due to directory tag change are not accessible anymore. However, here we have a pragmatic approach, that such configurations are artificial and not used in real production environments.
Hi Tigran,
yeah, I had not even considered that the pools keep their own meta data on each file, which includes the storage class. That information is not kept up-to-date with PnfsManager all the time, so stays unchanged for the lifetime of a replica on a pool.
So if we wanted to know whether any file carrying the tag we want to discontinue, we have to check in two places:
t_storageinfo
table in the Chimera database andRemoving the tag from any directory only ensures that no new files will be tagged with it.
Ciao,
Xavier.
Hi Xavier,
for now, we have no way to propagate new storage into to existing files on pools. Probably an explicit file migration must e performed to create replicas with new storage classes.
In general, t_storageinfo
is there for legacy. I think, that table can be dropped as HSM URI should be sufficient to restore existing files from tape. This, however, should be discussed on user forum.
Hi Lea, dCache.org,
a couple of weeks ago, we were looking for files in our dCache SE that carry a specific tag (because we wanted to get rid of said tag). With the believe that all files in dCache are tagged for life at the time of creation, i.e. once a file is tagged, that will never change, we were looking in the Chimera database for the information what tags a file was assigned. We found the
t_storageinfo
table, which sounded quite promising. It became obvious though, that a substantial amount of files are simply not found in that table at all. Where does dCache then get the (current) storageinfo from?With some experimentation, I learned that - in fact - only those files are tagged persistently in the
t_storageinfo
table, that have been flushed to tape. That is, with the successful archival on tape does dCache add a record of the file to thet_storageinfo
table, which then fixates the tags of that replica.My guess is, whenever
storageinfoof
is run for PnfsManager, it will first check whether a matching record is found in thet_storageinfo
table. If not, the file is not stored on HSM yet and dCache proceeds to lookup the current tags of the parent directory.During my experiments I also noticed that the pools won't start a new queue when the tags of a precious file change (which is understandable, since the pool doesn't want to bother Chimera with inquiries for precious files constantly). No clue what happens if a file is flushed to HSM after the directory tags are changed. Will the storageinfo be saved with the current directory tags, or the tags the pool knew at the time?
Anyway, the reason for this GitHub issue is, Lea asked me to create it, so the documentation may be improved on this specific topic. 🙂 As of now, the tags are touched upon in the The dCache Tertiary Storage System Interface/Storage Information, Chimera/directory Tags and PoolManawer/Reserving Pools for Storage and Cache Classes sections. Though it doesn't go into detail on the persistence of the tags. The inheritence of directory tags is explained well enough.
Hope this is helpful to you, Xavier.
Here I show some evidence of my experiments.
Two files are given, one with ONLINE and the other with NEARLINE access latency
Neither file is found in the
t_storageinfo
table at this pointChanging the directory tags is reflected in the storageinfo
t_storageinfo
table contains a match and the tags for that replica cannot be altered with a new directory tag anymore