dmwm / dasgoclient

Data Aggregation System (DAS) Go client
https://cmsweb.cern.ch/das/
MIT License
9 stars 4 forks source link

site queries don't work for instance=prod/phys03 #15

Closed mtonjes closed 6 years ago

mtonjes commented 6 years ago

dasgoclient queries with "site dataset=" returns nothing both in prod/phys03 dbs, but the commands work on the web interface using the dbs pulldown menu.

One I'm trying to implement (I will have 71 of these to check and want to script it): [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03" [tonjes@cmslpc36 ~]$

[tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/DisplacedJet/Run2017F-PromptReco-v1/MINIAOD" T1_FR_CCIN2P3_MSS T2_CN_Beijing T1_FR_CCIN2P3_Disk T1_FR_CCIN2P3_Buffer

Suggestion: I had tried the example and that gave nothing so at first I assumed that dasgoclient didn't work at all for "site". Perhaps picking a SAM test dataset would be one that stays longer for example. [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/WJets_matchingup_7TeV-madgraph/Summer10-START36_V10_FastSim-v3/DQM" [tonjes@cmslpc36 ~]$

vkuznet commented 6 years ago

Marguerite, it is not really a bug of dasgoclient it is its feature. By default this tool is optimized for speed and query only single (pre-defined system). For site information I chose phedex. But, some datasets as you shown are not known in phedex therefore their original location is still recorded in DBS. So, if you'll query: site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03 system=dbs3 detail=true you'll get proper site(s).

Here I instruct dbsgoclient to query dbs3 system (instead whatever default is) and I ask explicitly for all details (detail=true) option. The last is also part of optimization, i.e. details require more time to fetch from DBS and therefore by default their are off.

So, my suggestion if you query regular (production) datasets like AOD, RECO, etc. you use site dataset=/a/b/c query pattern. While for USER based datasets most likely it will return nothing since user dataset are not usually registered/transferred by phedex. In this case you use site dataset=/a/b/c system=dbs3 detail=true query pattern.

The web interface actually don't have such restrictions, it queries all available sources, but it costly and therefore it is (much) slower then dasgoclient.

Best, Valentin.

On 0, Marguerite notifications@github.com wrote:

dasgoclient queries with "site dataset=" returns nothing both in prod/phys03 dbs, but the commands work on the web interface using the dbs pulldown menu.

One I'm trying to implement (I will have 71 of these to check and want to script it): [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03" [tonjes@cmslpc36 ~]$

[tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/DisplacedJet/Run2017F-PromptReco-v1/MINIAOD" T1_FR_CCIN2P3_MSS T2_CN_Beijing T1_FR_CCIN2P3_Disk T1_FR_CCIN2P3_Buffer

Suggestion: I had tried the example and that gave nothing so at first I assumed that dasgoclient didn't work at all for "site". Perhaps picking a SAM test dataset would be one that stays longer for example. [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/WJets_matchingup_7TeV-madgraph/Summer10-START36_V10_FastSim-v3/DQM" [tonjes@cmslpc36 ~]$

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/15

vkuznet commented 6 years ago

It took me longer then I expected, but finally I applied solution which does not require additional option, i.e. the site for user based datasets will be look-up transparently from user query.

The fix is applied in master head of the tool and will appear starting v01.01.01 version once it will become available in CMSSW.

On 0, Marguerite notifications@github.com wrote:

dasgoclient queries with "site dataset=" returns nothing both in prod/phys03 dbs, but the commands work on the web interface using the dbs pulldown menu.

One I'm trying to implement (I will have 71 of these to check and want to script it): [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03" [tonjes@cmslpc36 ~]$

[tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/DisplacedJet/Run2017F-PromptReco-v1/MINIAOD" T1_FR_CCIN2P3_MSS T2_CN_Beijing T1_FR_CCIN2P3_Disk T1_FR_CCIN2P3_Buffer

Suggestion: I had tried the example and that gave nothing so at first I assumed that dasgoclient didn't work at all for "site". Perhaps picking a SAM test dataset would be one that stays longer for example. [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/WJets_matchingup_7TeV-madgraph/Summer10-START36_V10_FastSim-v3/DQM" [tonjes@cmslpc36 ~]$

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/15

vkuznet commented 6 years ago

Submitted for inclusion to CMSSW build, see https://github.com/cms-sw/cmsdist/pull/3641

On 0, Marguerite notifications@github.com wrote:

dasgoclient queries with "site dataset=" returns nothing both in prod/phys03 dbs, but the commands work on the web interface using the dbs pulldown menu.

One I'm trying to implement (I will have 71 of these to check and want to script it): [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03" [tonjes@cmslpc36 ~]$

[tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/DisplacedJet/Run2017F-PromptReco-v1/MINIAOD" T1_FR_CCIN2P3_MSS T2_CN_Beijing T1_FR_CCIN2P3_Disk T1_FR_CCIN2P3_Buffer

Suggestion: I had tried the example and that gave nothing so at first I assumed that dasgoclient didn't work at all for "site". Perhaps picking a SAM test dataset would be one that stays longer for example. [tonjes@cmslpc36 ~]$ dasgoclient -query="site dataset=/WJets_matchingup_7TeV-madgraph/Summer10-START36_V10_FastSim-v3/DQM" [tonjes@cmslpc36 ~]$

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/15

vkuznet commented 6 years ago

I tweaked this further, see https://github.com/cms-sw/cmsdist/pull/3665

vkuznet commented 6 years ago

The new behavior is the following:

For example here how new output will look like:

# dataset which are registered in PhEDEx
dasgoclient -query="site dataset=/TT_TuneCUETP8M1_13TeV-powheg-pythia8/RunIIFall15DR76-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext3-v1/AODSIM"
T1_US_FNAL_Buffer
T1_US_FNAL_MSS
T2_IN_TIFR

# user based dataset which is not registered in PhEDEx
dasgoclient -query="site dataset=/MinBias/across-CMSDAS2018_CRAB3_MC_generation_test0-44649de4a3d0682847715fac1e517148/USER instance=prod/phys03"
WARNING: No site records found in PhEDEx, will look-up original sites in DBS
T3_US_FNALLPC
vkuznet commented 6 years ago

fixed and propagated to all CMSSW release now.