Closed raggleton closed 6 years ago
Robin, there is nothing wrong with output since DAS is an "aggregation system". It queries all available APIs from different CMS data-services and aggregate them in single output. In your particular case DAS queried DBS datasets and dataselist APIs (you can see it under services key in JSON output). If you'll place file query it will query DBS and Phedex Apis, for run it will query DBS and RunRegistry and ConditionDB, etc. It is hard to decide which API is a "main" but different APIs serve different use-cases, e.g. they may provide different details or different piece of information. This is by design of DAS.
The plain data-format does not show duplicates though to make it convenient to end-users to cut and paste.
Does it answer your concern?
On Fri, May 18, 2018 at 1:06 PM, Robin notifications@github.com wrote:
Dear developers,
I am noticing duplicate entries when performing a simple dataset query with das_client, but only when asking for the JSON output format. e.g. If I do:
dasgoclient -query="dataset=/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016TrancheIV*/MINIAODSIM"
/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
but if I do
dasgoclient -json -query="dataset=/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016TrancheIV*/MINIAODSIM"
[ {"das":{"expire":1526662227,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:datasets"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1480970022,"creation_time":1480970022,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":13294650,"datatype":"mc","last_modification_date":1481196236,"last_modified_by":"vlimant","modification_time":1481196236,"modified_by":"vlimant","name":"/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"BTV-RunIISummer16MiniAODv2-00029","primary_dataset.name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8","primary_ds_name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1","processing_version":1,"status":"VALID","xtcrosssection":null}],"qhash":"04bf17ae46d481b865e64beb664fafa9"} , {"das":{"expire":1526662228,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:datasetlist"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","creation_date":1480970022,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":13294650,"last_modification_date":1481196236,"last_modified_by":"vlimant","name":"/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"BTV-RunIISummer16MiniAODv2-00029","primary_ds_name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1","processing_version":1,"xtcrosssection":null}],"qhash":"04bf17ae46d481b865e64beb664fafa9"} ]
Both entries have the same dataset name, same dataset_id, same prep_id, etc. Diffing between the two entries I get the following:
{ "das":{
- "expire":1526662228,
- "expire":1526662227, "instance":"prod/global", "primary_key":"dataset.name", "record":1, "services":[
- "dbs3:datasetlist"
- "dbs3:datasets" ] }, "dataset":[ { "acquisition_era_name":"RunIISummer16MiniAODv2", "create_by":"wmagent@vocms0308.cern.ch",
- "created_by":"wmagent@vocms0308.cern.ch", "creation_date":1480970022,
- "creation_time":1480970022, "data_tier_name":"MINIAODSIM", "dataset_access_type":"VALID", "dataset_id":13294650,
- "datatype":"mc", "last_modification_date":1481196236, "last_modified_by":"vlimant",
- "modification_time":1481196236,
- "modified_by":"vlimant", "name":"/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM", "physics_group_name":"NoGroup", "prep_id":"BTV-RunIISummer16MiniAODv2-00029",
- "primary_dataset.name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8", "primary_ds_name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8", "primary_ds_type":"mc", "processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1", "processing_version":1,
- "status":"VALID", "xtcrosssection":null } ],
so it looks like some fields have had their names changed, but otherwise it's the same dataset.
I also tried adding status=VALID to my query as that is one of the differences, but it returned an error:
das_client -query="dataset dataset=/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016TrancheIV*/MINIAODSIM status=VALID" -json
[ {"das":{"expire":1526663071,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:datasetlist"]},"dataset":[{"error":"DBS unable to unmarshal the data into DAS record, api=datasetlist, data={\"exception\": 400, \"message\": \"Invalid Input Key status...\", \"type\": \"HTTPError\"}, error=json: cannot unmarshal object into Go value of type []mongo.DASRecord","name":null}],"qhash":"51862fd0a82574188f7a74ef70c978de"} , {"das":{"expire":1526663071,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:datasets"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1480970022,"creation_time":1480970022,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":13294650,"datatype":"mc","last_modification_date":1481196236,"last_modified_by":"vlimant","modification_time":1481196236,"modified_by":"vlimant","name":"/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"BTV-RunIISummer16MiniAODv2-00029","primary_dataset.name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8","primary_ds_name":"QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1","processing_version":1,"status":"VALID","xtcrosssection":null}],"qhash":"51862fd0a82574188f7a74ef70c978de"} ]
Please let me know if there's any other info you need. For reference I'm using:
dasgoclient -version Build: git=v01.01.09 go=go1.9.2 date=2018-05-18 18:59:20.970684859 +0200 CEST m=+0.244268001
Thanks, Robin
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmwm/DAS/issues/4287, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHo0r2WOZ0hUQUCb2MlPSWNH0_iw0HSks5tzv-RgaJpZM4UFEH8 .
OK, I understand - it must be tricky to choose a "default" one given so many different use cases :) Thanks for the quick response!
Dear developers,
I am noticing duplicate entries when performing a simple dataset query with das_client, but only when asking for the JSON output format. e.g. If I do:
but if I do
Both entries have the same dataset name, same dataset_id, same prep_id, etc. Diffing between the two entries I get the following:
so it looks like some fields have had their names changed, but otherwise it's the same dataset.
I also tried adding
status=VALID
to my query as that is one of the differences, but it returned an error:Please let me know if there's any other info you need. For reference I'm using:
Thanks, Robin