dmwm / DBS

CMS Dataset Bookkeeping Service
Apache License 2.0
7 stars 21 forks source link

filelumis API reported Error 500 instead of input error when run=1 #587

Closed yuyiguo closed 5 years ago

yuyiguo commented 6 years ago

With run_num=1, got 500 error. https://cmsweb.cern.ch/dbs/prod/global/DBSReader/filelumis?block_name=%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM%23a712e608-0775-11e8-a129-02163e01877e&run_num=1 which by itself returns

{"exception": 500, "type": "dbsException", "message": "Server Error"}

W/o run_num, it works. We need to check the input to report the proper error message. https://cmsweb.cern.ch/dbs/prod/global/DBSReader/filelumis?block_name=%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM%23a712e608-0775-11e8-a129-02163e01877e

[{"lumi_section_num": [48, 517, 6171, 6548, 6792, 46, 519, 1179, 6031, 6032, 6033, 6319, 6547, 6790, 6791, 47, 6277, 6430, 544, 6302, 545, 6170, 6303, 6321, 6431, 6432, 6169, 6279, 6301, 52, 1178, 53, 54, 518, 546, 1177, 6278, 6320, 6549], "logical_file_name": "/store/mc/RunIISummer16MiniAODv2/ZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/00000/8A7CD5D2-7307-E811-A20D-FA163EBE1F9E.root", "run_num": 1}]

vkuznet commented 6 years ago

Yuyi, I'll suggest that you return meaningful message. DAS is capable to fetch error records and represent them. If your JSON will contain something like: {"message":"filelumis API does not accept run=1 parameter"} it would be much more clear. A simple "Server Error" is not enough in such cases.

On 0, Yuyi Guo notifications@github.com wrote:

With run_num=1, got 500 error. https://cmsweb.cern.ch/dbs/prod/global/DBSReader/filelumis?block_name=%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM%23a712e608-0775-11e8-a129-02163e01877e&run_num=1 which by itself returns

{"exception": 500, "type": "dbsException", "message": "Server Error"}

W/o run_num, it works. We need to check the input to report the proper error message. https://cmsweb.cern.ch/dbs/prod/global/DBSReader/filelumis?block_name=%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM%23a712e608-0775-11e8-a129-02163e01877e

[{"lumi_section_num": [48, 517, 6171, 6548, 6792, 46, 519, 1179, 6031, 6032, 6033, 6319, 6547, 6790, 6791, 47, 6277, 6430, 544, 6302, 545, 6170, 6303, 6321, 6431, 6432, 6169, 6279, 6301, 52, 1178, 53, 54, 518, 546, 1177, 6278, 6320, 6549], "logical_file_name": "/store/mc/RunIISummer16MiniAODv2/ZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/00000/8A7CD5D2-7307-E811-A20D-FA163EBE1F9E.root", "run_num": 1}]

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmwm/DBS/issues/587

yuyiguo commented 6 years ago

Yes, Valentin. The message like what you suggested is supposed for the server to return. There is a bug in the server to return 500 error. It will be fixed in next release.

AndrewLevin commented 6 years ago

Hi Yuyi and Valentin,

But is it really too intensive to perform this query with run = 1? I think being able to do this search on Monte Carlo is very useful and important. This type of query definitely used to work with run = 1. When was this feature disabled? Was there some discussion about this somewhere?

Andrew

vkuznet commented 6 years ago

Andrew, my understanding that Yuyi cut off usage of run=1 since it spawn over all MC datasets and it leads to huge table scan across datasets, run, lumi tables which are the most populated. In other words such join will take lots of time. It does not happen when run!=1 since only small number of entries are presented in a join tables.

In a past I got request from data-ops to implement file,run,lumi dataset=/a/b/c file,lumi dataset=/a/b/c run,lumi dataset=/a/b/c queries where you get list of of triplet/pairs and then just select your runs/lumis to avoid aforementioned problem. And, I think it can be applicable in your use case.

I let Yuyi comment further from DBS point of view.

Valentin.

On 0, AndrewLevin notifications@github.com wrote:

Hi Yuyi and Valentin,

But is it really too intensive to perform this query with run = 1? I think being able to do this search on Monte Carlo is very useful and important. This type of query definitely used to work with run = 1. When was this feature disabled? Was there some discussion about this somewhere?

Andrew

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/DBS/issues/587#issuecomment-433633943

AndrewLevin commented 6 years ago

Hi Valentin,

Yes, I can get the information I want using

dasgoclient --query "file,run,lumi dataset=/WGToLNuG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM" | grep -n0 ",35436,\|\[35436,\|35436\]" | awk '{print $1}' | awk -F: '{print $2}'

but it would be much more convenient if dbs/das could do this.

I am not sure why

file dataset=/ZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM run=1 lumi=1

should spawn more processes or be more difficult than

file,run,lumi dataset=/WGToLNuG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Andrew

vkuznet commented 6 years ago

Andrew, Yuyi should provide proper answer as she knows details of DB, but my understanding that providing run=1 and lumi=1 conditions forces DBS server to join almost all rows in dataset,run,lumi tables and then extract rows with given dataset. While without these conditions a smaller portion of DB tables will be join and therefore the throughput (latency) of API will be significantly improved (reduced). Best, Valentin.

On 0, AndrewLevin notifications@github.com wrote:

Hi Valentin,

Yes, I can get the information I want using

dasgoclient --query "file,run,lumi dataset=/WGToLNuG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM" | grep -n0 ",35436,\|\[35436,\|35436\]" | awk '{print $1}' | awk -F: '{print $2}'

but it would be much more convenient if dbs/das could do this.

I am not sure why

file dataset=/ZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM run=1 lumi=1

should spawn more processes or be more difficult than

file,run,lumi dataset=/WGToLNuG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Andrew

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/DBS/issues/587#issuecomment-433887505

yuyiguo commented 5 years ago

@AndrewLevin @vkuznet I just got time to work this issue. The problem with run_num=1 without dataset/bloc/file info is that it will do a full table scan on the biggest table of DBS. Before the holiday DBA caught an query that was running for 50 hours when the client already time out. So there is no reason for us to allow this kind query because the user will not get the result in 300 seconds that is the timeout for cmsweb. If DBS allow run_num=1 with dataset/block/file, the run_num here will not help at all. CMS will not mix MC runs with regular runs in a dataset/block/file any way. So as soon as you know the dataset/block/file, the entire dataset will be all run_num=1 or not. So there is no need to have run_num=1.

I hope this make things a bit clear.

AndrewLevin commented 5 years ago

@yuyiguo, you are right that the run number will not help at all for an MC dataset. But DAS also does not find the file when I specify only the lumi and not the run:

https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=file+dataset%3D%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM+lumi%3D1

Is there another way to get the file for a given lumi for an MC dataset (other than getting the full list of file,run,lumi for the entire dataset) that I am missing?. For an MC dataset there should be one file that contains each lumi number, and so I think the search should not be very difficult as long as there are no wildcards. I think maybe there should be a separate API for MC datasets versus non-MC datasets, or maybe you can just return an error message if the user searches for run_num != 1 for an MC dataset.

vkuznet commented 5 years ago

To rephrase Andrew's request, do we have API to find file for given dataset/lumi pair? If it exists then I can add it to DAS.

On 0, AndrewLevin notifications@github.com wrote:

@yuyiguo, you are right that the run number will not help at all for an MC dataset. But DAS also does not find the file when I specify only the lumi and not the run:

https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=file+dataset%3D%2FZGToLLG_01J_5f_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8%2FRunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FMINIAODSIM+lumi%3D1

Is there another to get the file for a given lumi for an MC dataset (other than getting the full list of file,run,lumi for every file) that I am missing?. For an MC dataset there should be one file that contains each lumi number, and so I think the search should not be very difficult as long as there are no wildcards. I think maybe there should be a separate API for MC datasets versus non-MC datasets.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dmwm/DBS/issues/587#issuecomment-454543912

yuyiguo commented 5 years ago

@AndrewLevin @vkuznet lumi section number has to be associated with run_num, otherwise it has no meaning to DBS. Is there only one file per dataset for MC? Are there any other info that can help to find files beside dataset name ?

DBS does not have an API that use lumi without run_num.