dmwm / DAS

Data Aggregation System
11 stars 7 forks source link

Consistent data types of output entities #2570

Closed DMWMBot closed 12 years ago

DMWMBot commented 13 years ago

Dear DAS developers, I try today to get the results from DAS command line used in my scripts.

I get a JSON result from a query such as "site dataset=xxxxx". What I noticed is that the format of the JSON seem to be inconsistent, i.e. when accessing results["data"] I see a different format for sites having more than one ip/server vs sites having only one. In particular while in the first case I get an "array" of names/ip/servers/etc, in the second case I only get a single element.

I think a consistent behaviour should be returning an array with only one element also in the case of sites having only one server. Isn't it?

vkuznet commented 13 years ago

valya: Could you please provide a concrete example of two queries (two different dataset names) which differs in output format.

DMWMBot commented 13 years ago

arizzi: Hi, it is even on the same dataset, e.g. doing :

cli --query="site dataset=/METBTag/Run2011A-May10ReReco-v1/AOD"

I get: {"status":"ok","mongo_query":{"fields":["site"],"spec":{"dataset.name":"/SingleElectron/Run2011A-May10ReReco-v1/AOD"},"instance":"cms_dbs_prod_global"},"hostname":"","ctime":19.737,"nresults":17,"ip":"193.205.76.4","args":{"input":"site dataset=/SingleElectron/Run2011A-May10ReReco-v1/AOD","pid":"94d4468395bf2372b2580f7b12c92bb0","limit":"10","idx":"0"},"method":"GET","headers":{"Remote-Addr":"128.142.16.214","X-Forwarded-For":"193.205.76.4","Accept-Encoding":"identity","X-Forwarded-Host":"cmsweb.cern.ch","Ssl-Client-Cert":"","Ssl-Client-S-Dn":"(null)","Host":"vocms140.cern.ch:8212","Accept":"application/json","User-Agent":"Python-urllib/2.6","Connection":"Keep-Alive","Cms-Request-Uri":"/das/cache","Https":"on","X-Forwarded-Server":"cmsweb.cern.ch","Ssl-Client-Verify":"NONE"},"timestamp":1.31791e+09,"path":"/cache","port":36643, "data": [{"das_id": ["4e8db12c0ec3dc11e3534275"], "_id": "4e8db1360ec3dc11e35342b5", "site": [{"ip": "117.103.100.84", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}, {"ip": "117.103.103.94", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}, {"ip": "117.103.103.44", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}], "cache_id": ["4e8db1360ec3dc11e35342ab", "4e8db1360ec3dc11e35342ac", "4e8db1360ec3dc11e35342a2"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909080, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db12c0ec3dc11e3534275"], "_id": "4e8db1360ec3dc11e35342be", "site": [{"ip": "130.246.180.84", "file_fraction": "100.00%", "name": "T1_UK_RAL_MSS", "se": "srm-cms.gridpp.rl.ac.uk"}, {"ip": "130.246.180.85", "file_fraction": "100.00%", "name": "T1_UK_RAL_MSS", "se": "srm-cms.gridpp.rl.ac.uk"}], "cache_id": ["4e8db1360ec3dc11e35342aa", "4e8db1360ec3dc11e35342ae"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909080, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db12c0ec3dc11e3534275"], "_id": "4e8db1360ec3dc11e35342bf", "site": [{"ip": "130.246.180.85", "file_fraction": "100.00%", "name": "T1_UK_RAL_Buffer", "se": "srm-cms.gridpp.rl.ac.uk"}, {"ip": "130.246.180.84", "file_fraction": "100.00%", "name": "T1_UK_RAL_Buffer", "se": "srm-cms.gridpp.rl.ac.uk"}], "cache_id": ["4e8db1350ec3dc11e353429d", "4e8db1360ec3dc11e35342ad"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909080, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343b9", "site": [{"ip": "117.103.103.44", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}, {"ip": "117.103.100.84", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}, {"ip": "117.103.103.94", "file_fraction": "100.00%", "name": "T2_TW_Taiwan", "se": "f-dpm001.grid.sinica.edu.tw"}], "cache_id": ["4e8db1960ec3dc11e35343b0", "4e8db1960ec3dc11e35343a6", "4e8db1960ec3dc11e35343af"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343c2", "site": [{"ip": "130.246.180.85", "file_fraction": "100.00%", "name": "T1_UK_RAL_MSS", "se": "srm-cms.gridpp.rl.ac.uk"}, {"ip": "130.246.180.84", "file_fraction": "100.00%", "name": "T1_UK_RAL_MSS", "se": "srm-cms.gridpp.rl.ac.uk"}], "cache_id": ["4e8db1960ec3dc11e35343b2", "4e8db1960ec3dc11e35343ae"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343c3", "site": [{"ip": "130.246.180.85", "file_fraction": "100.00%", "name": "T1_UK_RAL_Buffer", "se": "srm-cms.gridpp.rl.ac.uk"}, {"ip": "130.246.180.84", "file_fraction": "100.00%", "name": "T1_UK_RAL_Buffer", "se": "srm-cms.gridpp.rl.ac.uk"}], "cache_id": ["4e8db1960ec3dc11e35343a1", "4e8db1960ec3dc11e35343b1"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343c0", "site": {"ip": "131.225.206.126", "file_fraction": "100.00%", "name": "T1_US_FNAL_MSS", "se": "cmssrm.fnal.gov"}, "cache_id": ["4e8db1960ec3dc11e35343ad"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343bc", "site": {"ip": "193.40.150.99", "file_fraction": "100.00%", "name": "T2_EE_Estonia", "se": "ganymede.hep.kbfi.ee"}, "cache_id": ["4e8db1960ec3dc11e353439e"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343b3", "site": {"ip": "144.92.180.15", "file_fraction": "100.00%", "name": "T2_US_Wisconsin", "se": "cmssrm.hep.wisc.edu"}, "cache_id": ["4e8db1960ec3dc11e35343a2"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}, {"das_id": ["4e8db18c0ec3dc11e3534389"], "_id": "4e8db1960ec3dc11e35343bb", "site": {"ip": "134.158.132.123", "file_fraction": "100.00%", "name": "T2_FR_GRIF_LLR", "se": "polgrid4.in2p3.fr"}, "cache_id": ["4e8db1960ec3dc11e35343ab"], "das": {"condition_keys": ["dataset.name"], "empty_record": 0, "expire": 1317909173, "system": ["phedex"], "primary_key": "site.name"}}]}

you can see that while the first sites start with "site": [{"ip":

the latter are "site": {"ip":

cheers, andrea

vkuznet commented 13 years ago

valya: Andrea, I had a look into the problem you reported and it roots to some fundamentals of DAS. Before making a change (if any) I need to understand why do you care?

Let me explain why do I care. DAS is data agnostic, its purpose is to deal with data as is. Basically it gets a stream of records and if there are matches on primary key (e.g. site.name) it merges them together, otherwise it leaves records intact. So, that brings another use case. Let's say user ask for dataset=/a/b/c, there is no ambiguity in a request. Therefore, (logically) you may expect to get a dict for dataset record, not a list. This is how such requests represented now in DAS. Making a change, as you request, will break this use case, but indeed records will be more consistent in their structure. So I need to understand why it matters to you, since returned object is valid JSON and its up to client to interpret the data. I don't see a big deal that client can impose a data type check for data it's looking for. Bottom line, there is huge benefit to keep DAS as data agnostic and do not put data knowledge into the code which deals with general records.

But for completeness I attach the patch I need to fulfill this request.