Closed vkuznet closed 3 years ago
Almost. I think what you want to do is what this does: https://github.com/rucio/rucio/blob/0246888ceeb8cc12387c6aaffd398921b31da10e/lib/rucio/client/replicaclient.py#L117
You can pass either a container or a block and get all the file replicas, or if you pass an RSE it will give just data at that RSE.
Then you probably need to filter out what Rucio gives you for the files which matched the run in your example. Of course, you could query file by file or provide a list of files, but that may be less efficient or involve transferring more data.
The code shows you how to build the REST query.
Eric, I still need your assistance with this as I'm getting different errors from Rucio server. So if I correctly depict replicaclient.py codebase you pointed out I came up with the following plain curl call:
#!/bin/bash
opt="-s -L -k --key $HOME/.globus/userkey.pem --cert $HOME/.globus/usercert.pem"
token=`curl $opt -v https://cms-rucio-auth.cern.ch/auth/x509 2>&1 | grep "X-Rucio-Auth-Token:" | sed -e "s,< X-Rucio-Auth-Token: ,,g"`
echo "$token"
dataset=/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO
curl $opt -H "X-Rucio-Auth-Token: $token" -X POST -d '{"dids": ["scope":"cms", "name":"$dataset"], "domain": "all"}' "http://cms-rucio.cern.ch/replicas/cms/list"
Here I tried two URLs: http://cms-rucio.cern.ch/replicas/cms/list
which returns internal server error, but I'm not sure if /cms
should be part of URL since it does not like the case from replicaclient.py code. So I tried w/o it, i.e. http://cms-rucio.cern.ch/replicas/list
which gives me a different error {"ExceptionMessage": "Cannot decode json parameter list", "ExceptionClass": "ValueError"}
.
So, as you know I really need plain URL example in order to proceed with this request. Please guide me as necessary.
It’s list/replicas, not list/cms/replicas
I don’t know why exactly, but your JSON is throwing an error here:
https://github.com/rucio/rucio/blob/2ff6f17c7fda45524be8e644cb85c1ed568b0bcd/lib/rucio/web/rest/webpy/v1/replica.py#L345 https://github.com/rucio/rucio/blob/2ff6f17c7fda45524be8e644cb85c1ed568b0bcd/lib/rucio/web/rest/webpy/v1/replica.py#L345
In fact if I check your JSON, I get:
a = loads('{"dids": ["scope":"cms", "name":"/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO"], "domain": "all"}') Traceback (most recent call last): File "
", line 1, in File "/usr/lib64/python2.7/json/init.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 18 (char 17) a = loads('{"dids": ["scope": "cms", "name": "/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO"], "domain": "all"}')
Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.7/json/init.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 18 (char 17) On Jan 20, 2021, at 1:36 PM, Valentin Kuznetsov notifications@github.com wrote:
Eric, I still need your assistance with this as I'm getting different errors from Rucio server. So if I correctly depict replicaclient.py codebase you pointed out I came up with the following plain curl call:
!/bin/bash
opt="-s -L -k --key $HOME/.globus/userkey.pem --cert $HOME/.globus/usercert.pem" token=
curl $opt -v https://cms-rucio-auth.cern.ch/auth/x509 2>&1 | grep "X-Rucio-Auth-Token:" | sed -e "s,< X-Rucio-Auth-Token: ,,g"
echo "$token" dataset=/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO curl $opt -H "X-Rucio-Auth-Token: $token" -X POST -d '{"dids": ["scope":"cms", "name":"$dataset"], "domain": "all"}' "http://cms-rucio.cern.ch/replicas/cms/list" Here I tried two URLs: http://cms-rucio.cern.ch/replicas/cms/list which returns internal server error, but I'm not sure if /cms should be part of URL since it does not like the case from replicaclient.py code. So I tried w/o it, i.e. http://cms-rucio.cern.ch/replicas/list which gives me a different error {"ExceptionMessage": "Cannot decode json parameter list", "ExceptionClass": "ValueError"}.So, as you know I really need plain URL example in order to proceed with this request. Please guide me as necessary.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_das2go_issues_30-23issuecomment-2D763883603&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=mPJRZORsuklurZcOG_N0LaipVo0nQBBR5OXSTO6H6tA&s=Ety5LV7ga95YhISjLQWsmYskJWIyC141CzhFR6xUQjU&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLRNXV7FDOZRZWDKLIDS24WCHANCNFSM4WHJSYOQ&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=mPJRZORsuklurZcOG_N0LaipVo0nQBBR5OXSTO6H6tA&s=RWNV1w6aREYlMhQl6RkoGkvaoL-QIfeJw2g_Qb4elqo&e=.
Eric, thanks for spotting json problem. I managed to get the output with the following sequence of steps:
#!/bin/bash
opt="-s -L -k --key $HOME/.globus/userkey.pem --cert $HOME/.globus/usercert.pem"
token=`curl $opt -v https://cms-rucio-auth.cern.ch/auth/x509 2>&1 | grep "X-Rucio-Auth-Token:" | sed -e "s,< X-Rucio-Auth-Token: ,,g"`
echo "$token"
dataset=/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO
curl $opt -H "X-Rucio-Auth-Token: $token" -X POST -d '{"dids": [{"scope":"cms", "name":"/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO"}], "domain": "all", "rse_expression": "T2_DE_DESY"}' "http://cms-rucio.cern.ch/replicas/list"
The output looks like this now:
{"adler32": "df6675e0", "name": "/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/270001/BF52B44F-51A0-3248-B13A-9052DF7B03CA.root", "rses": {"T2_DE_DESY": []}, "bytes": 3736739040, "states": {"T2_DE_DESY": "AVAILABLE"}, "pfns": {}, "scope": "cms", "md5": null}
{"adler32": "07531d4b", "name": "/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/270001/BFBAA739-795D-AF49-ACFB-1B53033E7121.root", "rses": {"T2_DE_DESY": []}, "bytes": 3755039930, "states": {"T2_DE_DESY": "AVAILABLE"}, "pfns": {}, "scope": "cms", "md5": null}
...
which I hope would be sufficient for this use-case. I'll proceed with implementing necessary bits in DAS codebase.
Done. The new release on cmsweb is upgraded and new dasgoclient PR is here https://github.com/cms-sw/cmsdist/pull/6584
If you need a binary version of dasgoclient before it will be updated on cvmfs please take it from here:
/afs/cern.ch/user/v/valya/public/dasgoclient/dasgoclient
The new version is
Build: git=v02.04.23 go=go1.15.6 date=2021-01-21 21:15:20.46625747 +0100 CET m=+0.006210747
and your query looks like this:
./dasgoclient -query="file dataset=/JetHT/Run2018A-TkAlMinBias-12Nov2019_UL2018-v2/ALCARECO site=T2_DE_DESY run=316723"
/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/280000/25B4C3D5-03C1-F24E-9D35-E08860CBC145.root
/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/280000/4E29E31D-AA0E-8744-B558-98B35D8320E3.root
/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/280000/BAE93AF7-30F2-FC49-95FF-E584E4BE6773.root
/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/280000/FAF43300-D19D-E24E-9175-B800DBD5083C.root
/store/data/Run2018A/JetHT/ALCARECO/TkAlMinBias-12Nov2019_UL2018-v2/70001/67CF1160-478F-5E4E-9F7D-57E8E09C1E25.root
Closing the issue.
Originally the support for
query was done through DBS and Phedex APIs. First, we resolved list of blocks for a given dataset. Then, we find files for a given set of blocks and run number, and finally filter files using Phedex fileReplicas API to select files on a given site.
Now, we need to implement the same logic using DBS and Rucio APIs. The question is do we have similar to fileReplicas Rucio API to select files only for a given site or should we find another route in Rucio to accommodate this workflow.
@ericvaandering could you please comment on this?