dmwm / dasgoclient

Data Aggregation System (DAS) Go client
https://cmsweb.cern.ch/das/
MIT License
9 stars 4 forks source link

JSON output is not valid #3

Closed blinkseb closed 7 years ago

blinkseb commented 7 years ago

Hi!

I tried the new client in one of our script, but it looks like the JSON output is not valid. A simple query returns:

{"das":{"expire":1484673177,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:dataset_info"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1.483750024e+09,"creation_time":1.483750024e+09,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":1.3319561e+07,"datatype":"mc","last_modification_date":1.483868135e+09,"last_modified_by":"vlimant","modification_time":1.483868135e+09,"modified_by":"vlimant","name":"/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"SMP-RunIISummer16MiniAODv2-00074","primary_dataset.name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2","processing_version":2,"status":"VALID","xtcrosssection":null}],"qhash":"5d40ca73f424ab6715af0e00e23da11c"}
{"das":{"expire":1484673177,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:dataset_info"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1.483750024e+09,"creation_time":1.483750024e+09,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":1.3319561e+07,"datatype":"mc","last_modification_date":1.483868135e+09,"last_modified_by":"vlimant","modification_time":1.483868135e+09,"modified_by":"vlimant","name":"/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"SMP-RunIISummer16MiniAODv2-00074","primary_dataset.name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2","processing_version":2,"status":"VALID","xtcrosssection":null}],"qhash":"5d40ca73f424ab6715af0e00e23da11c"}

Note that the same block is repeated twice. Python JSON parser complains with

ValueError: Extra data: line 2 column 1 - line 3 column 312 (char 1127 - 2565)

Request is:

./dasgoclient_linux --query dataset=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM --json

Anyway, thanks for the new tool :+1:

vkuznet commented 7 years ago

Sebastien,

I think the problem is at your end. Here is what I did:

# run your query
./dasgoclient
-query="dataset=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM"
-json
# put first record manually into das.json file
shell > cat das.json
{"das":{"expire":1484673775,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:dataset_info"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1.483750024e+09,"creation_time":1.483750024e+09,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":1.3319561e+07,"datatype":"mc","last_modification_date":1.483868135e+09,"last_modified_by":"vlimant","modification_time":1.483868135e+09,"modified_by":"vlimant","name":"/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"SMP-RunIISummer16MiniAODv2-00074","primary_dataset.name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2","processing_version":2,"status":"VALID","xtcrosssection":null}],"qhash":"5d40ca73f424ab6715af0e00e23da11c"}
# load python and data from das.json
shell > python
Python 2.7.13 (default, Dec 18 2016, 05:36:31)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> data=json.load(open("das.json"))
>>> print data
{u'dataset': [{u'acquisition_era_name': u'RunIISummer16MiniAODv2',
u'creation_time': 1483750024.0, u'creation_date': 1483750024.0,
u'last_modification_date': 1483868135.0, u'processing_version': 2,
u'dataset_id': 13319561.0, u'primary_ds_name':
u'DYToLL_2J_13TeV-amcatnloFXFX-pythia8', u'modified_by': u'vlimant',
u'create_by': u'wmagent@vocms0308.cern.ch', u'created_by':
u'wmagent@vocms0308.cern.ch', u'processed_ds_name':
u'RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2',
u'primary_dataset.name': u'DYToLL_2J_13TeV-amcatnloFXFX-pythia8',
u'xtcrosssection': None, u'data_tier_name': u'MINIAODSIM', u'status': u'VALID',
u'physics_group_name': u'NoGroup', u'dataset_access_type': u'VALID',
u'modification_time': 1483868135.0, u'name':
u'/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM',
u'datatype': u'mc', u'last_modified_by': u'vlimant', u'primary_ds_type': u'mc',
u'prep_id': u'SMP-RunIISummer16MiniAODv2-00074'}], u'qhash':
u'5d40ca73f424ab6715af0e00e23da11c', u'das': {u'services':
[u'dbs3:dataset_info'], u'instance': u'prod/global', u'expire': 1484673775,
u'primary_key': u'dataset.name', u'record': 1}}

So, it loads just fine.

If you need to read all data from the output, here is the way:

./dasgoclient
-query="dataset=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM"
-json > data.txt

write your python code as following:

import json
with open("data.txt") as istream:
    for line in istream.readlines():
        data = json.loads(line.replace('\n', ''))
        print(data)

run it, it works just fine.

Please fix your code!

On 0, S�bastien Brochet notifications@github.com wrote:

Hi!

I tried the new client in one of our script, but it looks like the JSON output is not valid. A simple query returns:

{"das":{"expire":1484673177,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:dataset_info"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1.483750024e+09,"creation_time":1.483750024e+09,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":1.3319561e+07,"datatype":"mc","last_modification_date":1.483868135e+09,"last_modified_by":"vlimant","modification_time":1.483868135e+09,"modified_by":"vlimant","name":"/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"SMP-RunIISummer16MiniAODv2-00074","primary_dataset.name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2","processing_version":2,"status":"VALID","xtcrosssection":null}],"qhash":"5d40ca73f424ab6715af0e00e23da11c"}
{"das":{"expire":1484673177,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:dataset_info"]},"dataset":[{"acquisition_era_name":"RunIISummer16MiniAODv2","create_by":"wmagent@vocms0308.cern.ch","created_by":"wmagent@vocms0308.cern.ch","creation_date":1.483750024e+09,"creation_time":1.483750024e+09,"data_tier_name":"MINIAODSIM","dataset_access_type":"VALID","dataset_id":1.3319561e+07,"datatype":"mc","last_modification_date":1.483868135e+09,"last_modified_by":"vlimant","modification_time":1.483868135e+09,"modified_by":"vlimant","name":"/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM","physics_group_name":"NoGroup","prep_id":"SMP-RunIISummer16MiniAODv2-00074","primary_dataset.name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_name":"DYToLL_2J_13TeV-amcatnloFXFX-pythia8","primary_ds_type":"mc","processed_ds_name":"RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2","processing_version":2,"status":"VALID","xtcrosssection":null}],"qhash":"5d40ca73f424ab6715af0e00e23da11c"}

Note that the same block is repeated twice. Python JSON parser complains with

ValueError: Extra data: line 2 column 1 - line 3 column 312 (char 1127 - 2565)

Request is:

./dasgoclient_linux --query dataset=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM --json

Anyway, thanks for the new tool :+1:

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/vkuznet/dasgoclient/issues/3

blinkseb commented 7 years ago

Well my code is correct sorry. I shouldn't have to pre-process output from a tool which output JSON, that's non-sense. If there's more than one answer, you should wrap everything inside an array...

vkuznet commented 7 years ago

Ahh, I see what you mean, you want to have results as an array rather individual JSON records. I can do that.

On 0, Sébastien Brochet notifications@github.com wrote:

Well my code is correct sorry. I shouldn't have to pre-process output from a tool which output JSON, that's non-sense. If there's more than one answer, you should wrap everything inside an array...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/vkuznet/dasgoclient/issues/3#issuecomment-273240735

blinkseb commented 7 years ago

That's the idea yes. The output from the script as a whole should be a valid JSON. This allows doing things like

import json, subprocess
result = json.loads(subprocess.check_output(['dasgoclient', '-query=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM', '-json']).strip())

without having to pre-process the output of the script.

Thanks a lot :+1:

vkuznet commented 7 years ago

Try again, I uploaded the fixed version to my AFS area.

On 0, Sébastien Brochet notifications@github.com wrote:

Well my code is correct sorry. I shouldn't have to pre-process output from a tool which output JSON, that's non-sense. If there's more than one answer, you should wrap everything inside an array...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/vkuznet/dasgoclient/issues/3#issuecomment-273240735

vkuznet commented 7 years ago

Please close the ticket once you verify that it works. For completeness, code changes went into this commit: e8e1f2f..88d3829

On 0, Sébastien Brochet notifications@github.com wrote:

That's the idea yes. The output from the script as a whole should be a valid JSON. This allows doing things like

import json, subprocess
result = json.loads(subprocess.check_output(['dasgoclient', '-query=/DYToLL_2J_13TeV-amcatnloFXFX-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v2/MINIAODSIM', '-json']).strip())

without having to pre-process the output of the script.

Thanks a lot :+1:

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/vkuznet/dasgoclient/issues/3#issuecomment-273244388

vkuznet commented 7 years ago

the issue is fixed, closing the ticket.