dmwm / dasgoclient

Data Aggregation System (DAS) Go client
https://cmsweb.cern.ch/das/
MIT License
9 stars 4 forks source link

Filters are broken #13

Closed matz-e closed 6 years ago

matz-e commented 6 years ago

Filtering seems to be broken, and requesting JSON output after filtering gives invalid JSON:

(master U:1 ?:5 ✗) » dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.nevents"|head
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2AF23E03-E073-E711-B613-02163E014106.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/6ADB581F-E073-E711-898B-02163E0134D6.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/E0457FCC-E073-E711-ACA2-02163E01A2DA.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2CCEFBD0-E073-E711-8020-02163E011884.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/9653BC7A-E173-E711-8929-02163E0118E2.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/62D99022-F073-E711-A3CA-02163E019C12.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/D2D6A5D9-F273-E711-89F7-02163E01A208.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/92276AFA-F273-E711-9291-02163E013504.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/244FFCFE-F273-E711-8057-02163E019BC9.root"}]
(master U:1 ?:5 ✗) » dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.name" -json|head
[
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2AF23E03-E073-E711-B613-02163E014106.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/6ADB581F-E073-E711-898B-02163E0134D6.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/E0457FCC-E073-E711-ACA2-02163E01A2DA.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2CCEFBD0-E073-E711-8020-02163E011884.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/9653BC7A-E173-E711-8929-02163E0118E2.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/62D99022-F073-E711-A3CA-02163E019C12.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/D2D6A5D9-F273-E711-89F7-02163E01A208.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/92276AFA-F273-E711-9291-02163E013504.root   ,
vkuznet commented 6 years ago

Matthias, with a switch from das_client (python) to dasgoclient (Go-based) I targeted that tool should provide maximum speed of the queries. As such the detailed output of queries is disabled by default. What you need is to enable it. This can be done as following:

file dataset=/JetHT/Run2017C-v1/RAW detail=True | grep file.nevents

Basically you add to a query detail=True which trigger to fetch detailed output from DBS (it more costly and therefore takes longer to fetch and process).

With detail=True option I found a compromise between various user use cases. For those who most of the time only care about speed of queries it is disabled by default, for other it is there if you'll adjust your query.

Please try it out and either provide feedback or close the ticket.

Best, Valentin.

On 0, Matthias Wolf notifications@github.com wrote:

Filtering seems to be broken, and requesting JSON output after filtering gives invalid JSON:

(master U:1 ?:5 ✗) » dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.nevents"|head
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2AF23E03-E073-E711-B613-02163E014106.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/6ADB581F-E073-E711-898B-02163E0134D6.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/E0457FCC-E073-E711-ACA2-02163E01A2DA.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2CCEFBD0-E073-E711-8020-02163E011884.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/9653BC7A-E173-E711-8929-02163E0118E2.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/62D99022-F073-E711-A3CA-02163E019C12.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/D2D6A5D9-F273-E711-89F7-02163E01A208.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/92276AFA-F273-E711-9291-02163E013504.root"}]
[{"name":"/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/244FFCFE-F273-E711-8057-02163E019BC9.root"}]
(master U:1 ?:5 ✗) » dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.name" -json|head
[
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2AF23E03-E073-E711-B613-02163E014106.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/6ADB581F-E073-E711-898B-02163E0134D6.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/E0457FCC-E073-E711-ACA2-02163E01A2DA.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/2CCEFBD0-E073-E711-8020-02163E011884.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/9653BC7A-E173-E711-8929-02163E0118E2.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/62D99022-F073-E711-A3CA-02163E019C12.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/D2D6A5D9-F273-E711-89F7-02163E01A208.root   ,
/store/data/Run2017C/JetHT/RAW/v1/000/300/079/00000/92276AFA-F273-E711-9291-02163E013504.root   ,

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/13

matz-e commented 6 years ago

Valentin,

Thanks for the feedback! I've tried the option out (I didn't notice it before, sorry), and it works for non-json output. I still think the following changes could be added:

Thanks, Matthias

vkuznet commented 6 years ago

Matthias, thanks for good suggestion. I implemented your first bullet, the auto-detection of detailed output when user provide a filter, see https://github.com/dmwm/das2go/commit/114fc26950cc144131be41ae02d3b55c7baf886e

But your second suggestion and usage of filter in json output is much more cumbersome. The problem is that when user ask for json the output should be parse-able JSON output. Adding a string error/warning to it will break this. The filter itself does not break the query either, therefore throwing an error message can be confusing at least. I'm in favor of returning full detailed output and ignore filter in json output. I need to think about it. Let me know your thoughts on it if you want.

I'll think about your second item for a bit and then either schedule new version without it (but with a provided fix for first bulet) or add more code to handle error or warning.

Best, Valentin.

On 0, Matthias Wolf notifications@github.com wrote:

Valentin,

Thanks for the feedback! I've tried the option out (I didn't notice it before, sorry), and it works for non-json output. I still think the following changes could be added:

  • Add detail=True if a filter is involved. This would make sense IMHO, since filtering like | grep file.nevents implies that more than basic information is needed, and would seem more natural?
  • Print out some error message if invalid combinations of options are used, in this case if a filter is used together with the -json switch.

Thanks, Matthias

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/13#issuecomment-330523888

vkuznet commented 6 years ago

Matthias, requested changes to add detail=True in queries with filters is committed to main repository. See build: https://github.com/cms-sw/cmsdist/pull/3453 Once it's done changes will appear in all CMSSW releases. Best, Valentin.

On 0, Matthias Wolf notifications@github.com wrote:

Valentin,

Thanks for the feedback! I've tried the option out (I didn't notice it before, sorry), and it works for non-json output. I still think the following changes could be added:

  • Add detail=True if a filter is involved. This would make sense IMHO, since filtering like | grep file.nevents implies that more than basic information is needed, and would seem more natural?
  • Print out some error message if invalid combinations of options are used, in this case if a filter is used together with the -json switch.

Thanks, Matthias

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/13#issuecomment-330523888

matz-e commented 6 years ago

Great, thank you! For the second point, I forgot to stress that using a filter and requesting JSON output format does not give valid JSON:

(master {1} ?:2 ✗) » ./dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.name" -json|head
[
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root   ,

The filename should actually be surrounded by quotes. I would still suggest having dasgoclient fail when a filter is used with -json.

Thanks, Matthias

vkuznet commented 6 years ago

Matthias, which version of dasgoclient are you using in this example. I just did on lxplus:

dasgoclient -query="file dataset=/ZMM/Summer11-DESIGN42_V11_428_SLHC1-v1/GEN-SIM" -json [ {"das":{"expire":1506519335,"instance":"prod/global","primary_key":"file.name","record":1,"services":["dbs3:files_via_dataset"]},"file":[{"name":"/store/mc/Summer11/ZMM/GEN-SIM/DESIGN42_V11_428_SLHC1-v1/0003/02ACAA1A-9F32-E111-BB31-0002C90B743A.root"}],"qhash":"f28499f53016567764d7ba76a3d76e4a"} , {"das":{"expire":1506519335,"instance":"prod/global","primary_key":"file.name","record":1,"services":["dbs3:files_via_dataset"]},"file":[{"name":"/store/mc/Summer11/ZMM/GEN-SIM/DESIGN42_V11_428_SLHC1-v1/0003/24DAE39C-AC32-E111-AED8-0002C90B743A.root"}],"qhash":"f28499f53016567764d7ba76a3d76e4a"} , ...

and it produces proper json, i.e. it yields full DAS records preserving data-types.

Best, Valentin.

On 0, Matthias Wolf notifications@github.com wrote:

Great, thank you! For the second point, I forgot to stress that using a filter and requesting JSON output format does not give valid JSON:

(master {1} ?:2 ✗) » ./dasgoclient -query "file dataset=/JetHT/Run2017C-v1/RAW | grep file.name" -json|head
[
/store/data/Run2017C/JetHT/RAW/v1/000/300/064/00000/60E64405-E073-E711-BEC4-02163E01A4B1.root   ,

The filename should actually be surrounded by quotes. I would still suggest having dasgoclient fail when a filter is used with -json.

Thanks, Matthias

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/13#issuecomment-332488113

matz-e commented 6 years ago

It should be the newest from master, I compiled it locally. Note that your query does not have a filter. I don't consider this very important, so I can also close this issue if you deem this not worth fixing.

vkuznet commented 6 years ago

Hmm, strange, since I do use my master and I don't see this issue (mailformed json output). Please note that you need to have das2go package too which I periodically update. Could you please try to check/update/download latest versions of das2go and dasgoclient and compile later again and try it out.

I do care about issues and broken json is certainly worth of fixing. But as far as I can tell it is not the case in my setup. So, it's worth to double check at your end.

Best, Valentin.

On 0, Matthias Wolf notifications@github.com wrote:

It should be the newest from master, I compiled it locally. Note that your query does not have a filter. I don't consider this very important, so I can also close this issue if you deem this not worth fixing.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dmwm/dasgoclient/issues/13#issuecomment-332580153

vkuznet commented 6 years ago

I fixed these issues and new dasgoclient is available in CMSSW releases.