Closed dciangot closed 2 years ago
by the way, the following example can be more useful than me diving into guessing code workflow:
this is showing T2_BR_UERJ site with some blocks present. That is coherent with calling rucio with deep=False
(default):
$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE | FOUND | TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape | 64 | 64 |
| T2_BR_UERJ | 0 | 64 |
| T2_UK_London_IC | 59 | 64 |
+-----------------+---------+---------+
But indeed using deep=True
we got the correct location where T2_BR_UERJ is not within the sites:
$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca --deep
DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE | FOUND | TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape | 64 | 64 |
| T2_UK_London_IC | 59 | 64 |
+-----------------+---------+---------+
@vkuznet
FYI: @belforte @ericvaandering @nsmith- please correct me if I'm reporting this wrong
I thought DAS already switched to using deep=True. Nevertheless, I think that's the correct approach.
maybe a better candidate for the fix is this one: https://github.com/dmwm/das2go/blob/6a5e0e36d495a0a936851f8e175ed7d1e5c9250a/services/combined.go#L237
Hi, I applied deep=True
to /replicas/cms/<block>/datasets
API and new server is available on cmsweb-testbed, please inspect results and let me know if it ok now. For instance, here is your dataset on testbed: https://cmsweb-testbed.cern.ch/das/request?view=list&limit=50&instance=int%2Fglobal&input=site+dataset%3D%2FDoubleMuon%2FRun2017D-17Nov2017-v1%2FAOD
If you confirm, then I can apply it to production server.
Yes, at least the behavior is consistent with --deep.
thanks Valentin, looks good. Here's e.g. another example on a larger dataset old: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD new: https://cmsweb-testbed.cern.ch/das/request?instance=int/global&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD
The number of sites where the dataset is present changed from 36 to 33 but most relevant is that in new view all disk sites have full blocks (file replica presence is always 100%) which is as we like it to be. If file presence were not 100% it would mean that data is in transfer, or that Rucio did not work as it should. So we expect it to be very rare.
FWIW I feel much better about our data placement now !
From my side you can close and move to production, Thanks again for super fast fix.
Now, new das server version in production, I'll need to update dasgoclient though.
New dasgoclient version v02.04.48 is in cmsdist pipeline, see https://github.com/cms-sw/cmsdist/pull/7834
I'm closing this ticket.
The RUCIO API that lists dataset replicas location has a known issue (*) that make it provides inconsistent/outdated location. The correct response is provided by the very same API but with
deep=True
parameter (**).If I follow the DAS code correctly (big if) the only point where this API is used is here (***) (not sure though, the name is not what I expect the call to do). It should be then enough to add deep=True parameter to this call in order to get the correct set of dataset location.
(*) https://github.com/dmwm/CMSRucio/issues/257
(**) https://rucio.readthedocs.io/en/old-doc/restapi/replica.html#get--replicas-(path-scope_name)-datasets
(***) https://github.com/dmwm/das2go/blob/12589cef23f2ced2d985b6ce3d92aed2a602c4d8/das/das.go#L641-L644