Open haozturk opened 3 months ago
Hi @haozturk . We don't know the size of a dataset a priori. We can estimate the ratio between different datasets from previous runs, but the total size of a given data set will be affected by many variables, from detector performance to timing of the next Era change.
We distribute our output among T1 so each receives data roughly in the same ratio as there is free space in tape. However, we can only do this in era changes and we tipically do it only once a year (As the dataset size ratios remain mostly stable in the same year)
Enhancement Description
Currently, there are two factors that affect the calculation of the
dm_weight
attribute which is used by MSOutput while choosing tape destinations: pledge of the RSE w.r.t. other tapes and relative free space of the RSE [1]. We need to start taking into accountWAITING_APPROVAL
rules as well[1] https://github.com/dmwm/CMSRucio/blob/d38e1671bc329903447dee68930e964a48fc4f17/docker/rucio_client/scripts/updateDDMQuota#L82
Use Case
It might take weeks if not months for some tapes to consume WAITING_APPROVAL rules. If we don't take them into account while calculating
dm_weight
, we'll end up sending more data to those sites than necessary, which might lead to uneven distribution of tape data.Possible Solution
Calculate the total volume of WAITING_APPROVAL rules per tape RSE and add it to the occupancy of the RSE while calculating relative free space of the RSE which is an input to the
dm_weight
calculation. T0 datasets might be growing and their size might change. @germanfgv what's the best way to get the total size a dataset for a given T0 tape rule?Related Issues
No response