Closed vkuznet closed 8 years ago
Seangchan, by visually inspecting this file I see that it has some "nasty" features. For instance for some parts "runs" attribute has empty dict, for others it has dict with string keys, e.g. "runs": {"1":[1,2,3]}. If I look at processing FWJR I see "runs":{"mlist":[1,2,3]}. As you can see all structures are differ and moreover there is a different key assignment, in one case it is "1" in another "mlist". My point is that it is not fixed data structure for which I can generate schema. Can we fix that in WMAgent for all FWJR. We need pre-defined set of keys and not their dynamic variations. The same goes to "output". In this document it has key attribute named RAWSIMoutput, while in document I used it had MINIAODoutput. This is the same issue. For schema we need fixed keys. I would change later to be "name":"MINIAODoutput" and "name":"RawSIMoutput", i.e. use fixed key "name" and assign to it different values.
Off the top of my head I don't know what "mlist" is. But I would guess that LogCollect jobs are not going to have runs. The concept makes no sense.
Here is exact lines in FWJR I used to work with: https://github.com/dmwm/WMArchive/blob/master/test/data/fwjr_processing.json#L73 https://github.com/dmwm/WMArchive/blob/master/test/data/fwjr_processing.json#L120
the former shows usage of "mlist", while later shows output key MINIAODoutput. From schema point of view both are bad keynames, i.e. they're not persistent and will vary from one doc to another. I consider it a bad choice made somewhere in WMCore which we must fix before we'll start collecting docs.
On 0, Eric Vaandering notifications@github.com wrote:
Off the top of my head I don't know what "mlist" is. But I would guess that LogCollect jobs are not going to have runs. The concept makes no sense.
Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/46#issuecomment-173586383
Sorry, why is MINIAODOutput bad? That's the name of the output module that wrote the data. You can also see RECOoutput in the same report.
Because it is not persistent key name, it is name of the output and it's better to be presented in FWJR document as "outputName": "MINIAODoutput" "outputName": "RECOoutput", ... rather then using output names as dictionary keys. They're values of our output workflows.
The problem that schema treats them as schema keys since they are dictionary keys. Since who knows how we may call our outputs in a future and how many outputs we'll have it translates that schema will always change, i.e. such keys are not persistent across documents neither now or in a future.
On 0, Eric Vaandering notifications@github.com wrote:
Sorry, why is MINIAODOutput bad? That's the name of the output module that wrote the data. You can also see RECOoutput in the same report.
Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/46#issuecomment-173592463
Sorry Valentin, I didn't inform this issue update since the json is not complete. I will let you know when I have complete json. Yes, I notices that could be the problem. We can discuss on the chat what we can do about it.
Hi Valentin, new structure should be like this. If you find any other arbitrary key value let me know.
Patch is updated to support new structure in WMAgent side (Still need to be tested) https://github.com/dmwm/WMCore/pull/6440/files
'output': [{'outputModule': 'RAWSIMoutput',
'value': [{'branch_hash': '15a5492dd49bc7f0ce80621e66145e09',
'catalog': '',
'events': 813,
'guid': 'F0A3C803-3024-E511-A154-0025902009B4',
'runs': [{'runNumber': '1',
'value': [1,
2,
3,
4]}]}]},
Seangchan, that seems fine to me. Although I am not sure I understand the relation between runNumber and runNumberList. Could you please clarify that to me. V.
On 0, ticoann notifications@github.com wrote:
Hi Valentin, new structure should be like this. If you find any other arbitrary key value let me know.
- Removing arbitrary key value from dictionary. output = {name: value} -> output = [{'outputModule': name, 'value': value} runs = {runNumber: runNumberList} -> runs = [{'runNumber': runNumber, 'value': runNumberList}
- for input, left as it is since inputSource is alwas 'source'
Patch is updated to support new structure in WMAgent side (Still need to be tested) https://github.com/dmwm/WMCore/pull/6440/files
'output': [{'outputModule': 'RAWSIMoutput', 'value': [{'branch_hash': '15a5492dd49bc7f0ce80621e66145e09', 'catalog': '', 'events': 813, 'guid': 'F0A3C803-3024-E511-A154-0025902009B4', 'runs': [{'runNumber': '1', 'value': [1, 2, 3, 4]}]}]},
Reply to this email directly or view it on GitHub: https://github.com/dmwm/WMArchive/issues/46#issuecomment-173974635
I am not sure either. I thought initially that should be lumilist but description says differently.
getattr(cfgSectionRuns, runNumber)
Although I am pretty sure that should be lumi lists. I will double check. Thanks,