dmwm / das2go

Go implementation of Data Aggregation System (DAS) for CMS experiment
MIT License
2 stars 3 forks source link

Error when looking for a specific lumi in MC #26

Closed jrueb closed 4 years ago

jrueb commented 4 years ago

Description

Running the command file dataset=/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM run=1 lumi=123

(on the web interface) gives me a long error message saying things like

error=json: cannot unmarshal object into Go value of type []mongo.DASRecord

Details

I am trying to find the miniAOD file that holds a specific event I have found in nanoAOD. For this I have looked up the value of the luminosityBlock of the event in nanoAOD. The value is 123 and the dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18NanoAODv5-Nano1June2019_102X_upgrade2018_realistic_v19-v1/NANOAODSIM.

DAS tells me the parent dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM. Thus to find the file I'm looking for, I use the previously mentioned query.

vkuznet commented 4 years ago

Jonas, the issue you're facing is related to run=1. In DBS it covers ALL MC samples. Therefore DBS will timeout for this queries. Instead, you should use a different DAS query like this:

file,run,lumi dataset=/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM

which will return you the triplets and then you can select triplet(s) you're looking for.

And, remember that run condition for DAS queries only work for run!=1.

jrueb commented 4 years ago

Thanks for the reply. I see the problem.

The commend you have provided gives me so many results, it says "Showing 1—50 records out of 991." I think this is a very unpractical solution, and not useable without the creation of an additional script that processes the output.

vkuznet commented 4 years ago

Please use CLI tool, of course it is impractical for web when you have plenty of results but you can do this in one line on lxplus:

# obtain your proxy
voms-proxy-init -voms cms -rfc

# run dasgoclient for you query and use UNIX tools to parse it
dasgoclient -query="file,run,lumi dataset=/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM" | grep 123

Use CLI for everything which requires more then 50 results and you'll understand its power.

jrueb commented 4 years ago

I tried greb before but that gives me 109 lines (as one can see when piping into wc). Is there maybe a single command for CLI that narrows it down to one result?

vkuznet commented 4 years ago

So, 109 is 10 times smaller then web 991, right.

There is no single single query and/or tool, and you need to use additional toolkit to get through the results. It can be done either using UNIX tools or you can easily use python, e.g. if you use -json option of dasgoclient you'll get a JSON which you can load to python and parse it.

Please understand that amount of data we deal with can't always fit into single query and we always battle between responsiveness and concise output. The triplets is the best combination to deal with 100M files x all runs x all lumis we have in CMS universe.

vkuznet commented 4 years ago

no more work needs to be done here, closing the ticket.