dmwm / das2go

Go implementation of Data Aggregation System (DAS) for CMS experiment
MIT License
2 stars 3 forks source link

Fix rucio list dataset replicas #43

Closed dciangot closed 2 years ago

dciangot commented 2 years ago

The RUCIO API that lists dataset replicas location has a known issue (*) that make it provides inconsistent/outdated location. The correct response is provided by the very same API but with deep=True parameter (**).

If I follow the DAS code correctly (big if) the only point where this API is used is here (***) (not sure though, the name is not what I expect the call to do). It should be then enough to add deep=True parameter to this call in order to get the correct set of dataset location.

(*) https://github.com/dmwm/CMSRucio/issues/257

(**) https://rucio.readthedocs.io/en/old-doc/restapi/replica.html#get--replicas-(path-scope_name)-datasets

(***) https://github.com/dmwm/das2go/blob/12589cef23f2ced2d985b6ce3d92aed2a602c4d8/das/das.go#L641-L644

dciangot commented 2 years ago

by the way, the following example can be more useful than me diving into guessing code workflow:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=site+dataset%3D%2FDoubleMuon%2FRun2017D-17Nov2017-v1%2FAOD

this is showing T2_BR_UERJ site with some blocks present. That is coherent with calling rucio with deep=False (default):

$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca

DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE             |   FOUND |   TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape  |      64 |      64 |
| T2_BR_UERJ      |       0 |      64 |
| T2_UK_London_IC |      59 |      64 |
+-----------------+---------+---------+

But indeed using deep=True we got the correct location where T2_BR_UERJ is not within the sites:

$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca --deep

DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE             |   FOUND |   TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape  |      64 |      64 |
| T2_UK_London_IC |      59 |      64 |
+-----------------+---------+---------+
dciangot commented 2 years ago

@vkuznet

FYI: @belforte @ericvaandering @nsmith- please correct me if I'm reporting this wrong

ericvaandering commented 2 years ago

I thought DAS already switched to using deep=True. Nevertheless, I think that's the correct approach.

dciangot commented 2 years ago

maybe a better candidate for the fix is this one: https://github.com/dmwm/das2go/blob/6a5e0e36d495a0a936851f8e175ed7d1e5c9250a/services/combined.go#L237

vkuznet commented 2 years ago

Hi, I applied deep=True to /replicas/cms/<block>/datasets API and new server is available on cmsweb-testbed, please inspect results and let me know if it ok now. For instance, here is your dataset on testbed: https://cmsweb-testbed.cern.ch/das/request?view=list&limit=50&instance=int%2Fglobal&input=site+dataset%3D%2FDoubleMuon%2FRun2017D-17Nov2017-v1%2FAOD

If you confirm, then I can apply it to production server.

dciangot commented 2 years ago

Yes, at least the behavior is consistent with --deep.

belforte commented 2 years ago

thanks Valentin, looks good. Here's e.g. another example on a larger dataset old: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD new: https://cmsweb-testbed.cern.ch/das/request?instance=int/global&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD

The number of sites where the dataset is present changed from 36 to 33 but most relevant is that in new view all disk sites have full blocks (file replica presence is always 100%) which is as we like it to be. If file presence were not 100% it would mean that data is in transfer, or that Rucio did not work as it should. So we expect it to be very rare.

FWIW I feel much better about our data placement now !

From my side you can close and move to production, Thanks again for super fast fix.

vkuznet commented 2 years ago

Now, new das server version in production, I'll need to update dasgoclient though.

vkuznet commented 2 years ago

New dasgoclient version v02.04.48 is in cmsdist pipeline, see https://github.com/cms-sw/cmsdist/pull/7834

I'm closing this ticket.