dmwm / das2go

Go implementation of Data Aggregation System (DAS) for CMS experiment
MIT License
2 stars 3 forks source link

question about run,lumis in valid files #58

Closed belforte closed 1 year ago

belforte commented 1 year ago

I am not sure it it is better to ask here, or in cms-talk, in case advise, thanks !

when I issue

dasgoclient --query 'run,lumi dataset=...`

does it list runs and lumis from all files, or only valid ones (is_file_valid=1) ?

In case, is there a dasgoclient syntax which allows to restrict things that way, or is the only way to make a list of valid files first and a ton of dasgoclient queries after ?

I tried

dasgoclient --query  'file,run,lumi,is_file_valid dataset=/PPRefZeroBias0/Run2023F-v1/RAW'

with or w/o --json at the right, but output is the same as if omitting is_file_valid

Also I can't use | grep file.xxx I presume because the file dictionary in the output of

dasgoclient --query  'file,run,lumi dataset=/PPRefZeroBias0/Run2023F-v1/RAW' --json

only contains the file name, differently from the query file dataset=...

vkuznet commented 1 year ago

Stefano, this query is complicated as it runs through series of DBS API. But as far as I looked up in a code it does the following:

file,run,lumi data=set/a/b/c

resolves into finding blocks and then for every block we look-up file,run,lumi triplets. Said that, it seems it quries by default blocks with all files. But in order to pass valid file status, you should change the query to

file,run,lumi data=set/a/b/c status=valid

the status is DAS keyword to specify file status. Remember the DAS query is composition of <select keys> <conditions>, therefore you select file, run, lumi and apply conditions dataset=/a/b/c and status.

I'm on a break now and will not spend time on it until I back. You may try it with status to see the difference, if it will produce the same results we'll need to resolve all APIs calls to DBS to see how it is done. You can do it too by adding -verbose=2 argument to dasgoclient and you'll see all URL calls it does.

belforte commented 1 year ago

THANKS VERY MUCH. I will try. Sorry to have bugged at the wrong time.

belforte commented 1 year ago
file,run,lumi dataset/a/b/c status=valid

works like a charm !!!! I tested on a dataset with invalid files (of course).

belforte commented 1 year ago

Thanks you Valentin :bowing_man:

belforte commented 1 year ago

I found a more fundamental problem with

file,run,lumi dataset/a/b/c

the output has one entry per file (OK) but the for each file there is a list of run numbers and one uncorrelated list of lumis. While of course one needs the list or proper (run,lumi) pairs in whatever format. There is no problem with run,lumi dataset=/a/b/c since it produces one entry per run with one list of lumi in each, but I can't filter on file status in that.

I guess I am sticking with listing (valid) files first, and lumis in each second.

The use case for this is marginal (a little used CRABClient functionality), so I think there is no point in "fixing".