Closed tomeichlersmith closed 1 year ago
@bryngemark In your python script that pushes the meta data to rucio, you put the meta data into a python dictionary. The parameters for all the processors are already in python dictionaries, so it will be easy to combine them. I am going to attach the processor name to the front of the parameter names so that there isn't a chance of conflicting parameters. e.g. If processor Proc
has parameter key
with value val
, then the meta data would be Proc::key = val
.
Thanks, Tom!
I think we could do this step-wise, to allow for early testing to tell us more about how we want the ldmx-sw/rucio interface to operate. The first step, in my mind, is publishing all the parameters somewhere. The existing script can then be modified to pick these up and write to rucio as before.
There are use cases like correctly setting up metadata for separated sim and reco, and rereco, that I would like to discuss with the rucio expert in the LDCS group. I have a vision for doing it but would need to understand the technical implementation. I think keeping this a bit modular for now will make for an easier integration down the line. Eventually I in any case think that all the LDCS scripts should be maintained by us and live on our github.
Cool! I will have a solution pushed to this branch shortly that does everything not including rucio. (i.e. it will dump everything to a json file instead of trying to push to rucio right away).
Yes this was Andrii's point too, that if we can stick to python this makes everything easier. I think prepending everything with the processor name is a good approach.
I also think that for maximum versatility we should implement "unsupervised" writing to rucio. It would be a pain to update all the possible parameter names whenever something changes. But again this is decoupled from ldmx-sw.
How do you want to handle parameters that are lists? For example, the preInitCommands
for the Simulator is a list of strings passed as commands to Geant4. These could significantly alter the behavior of the sim, so they should be tracked, but I don't know the limits of the meta-data system.
Good point. For lists from the mac file, like random seeds, we opted for making them two separate keys. I'm not sure this comes from a fundamental limitation regarding the types of entries allowed in the database. To be followed up! Let's operate under the initial assumption that it is, to see if we can deal with it?
The list that is passed to preInitCommands
corresponds to the old mac commands though, right? Perhaps they could simply be parsed the same way as before. So in that case, you would dump them to the json just like any other parameter, and the translating script would need to recognize that this is a bunch of mac commands to parse.
For other lists, I imagine that they are short. Like the list of trigger pad hit collections you'd like to use for tracking, as an example from my own domain. Can we just enumerate them? Much like the random seed solution.
I will just enumerate them for now, we can do a more elegant solution if it arises.
I was just about to say, just dump everything as lists, and the translating script can do the enumeration/list writing, whatever works on the rucio end. Keeping it modular :)
So the central issue is the parameters like actions
or generators
where each member of the list has its own set of parameters. That is what forces the enumerate vs full lists issue to be apart of the dump command in ldmx-sw.
Here's what I thought. You ask in a config file to use a specific generator. It has a set of parameters. Some of them are G4 macro style-commands, which are a list, from the python config perspective. I thought these could be dumped as lists, then.
Are you saying there are additional levels of lists within the lists? Or are you saying that there are many different lists possible, because many different predefined generators exist?
Maybe a concrete example would make this easier for me to understand.
Here's the most common example:
The processor Simulator
has a list parameter called generators
that lists the primary generators to use. One of the generators you could use is the ParticleGun
which has its own dictionary of parameters, one of them being direction
which is a list defining the vector to point the gun.
The solution I'm doing right now is to have dictionaries inside of dictionaries. An example printout:
{'compressionSetting': 9,
'hcalDigis::strips_side_tb_per_layer': 12,
# below is the example that I talked about
'mySim::generators': [{'direction': [0.07845, 0, 0.996925], 'name': 'single_4gev_e_upstream_tagger', 'particle': 'e-', 'energy': 4.0, 'position': [-27.926, 0, -700], 'class': 'ldmx::ParticleGun'}],
'hcalDigis::strip_attenuation_length': 5.0,
'ecalDigis::iSOI': 0,
'trigScintDigis::mean_noise': 0.02,
'ecalRecon::digiCollName': 'EcalDigis',
'passName': 'sim',
'mySim::detector': '/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml',
'ecalDigis::noiseIntercept': 700.0,
'trigScintDigis::number_of_strips': 50,
'outputFile0': 'myFirstSim_10_events.root',
'skimDefaultIsKeep': True,
'trigScintDigis::input_collection': 'TriggerPadTaggerSimHits',
'hcalDigis::meanNoise': 0.02,
'hcalDigis::num_side_lr_hcal_layers': 26,
'ecalDigis::padCapacitance': 0.1,
'trigScintDigis::pe_per_mip': 10.0,
'maxEvents': 10,
'ecalRecon::layerWeights': [1.675, 2.724, 4.398, 6.039, 7.696, 9.077, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 13.497, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 8.99],
'hcalDigis::strips_back_per_layer': 60,
'ecalDigis::noiseSlope': 25.0,
'ecalRecon::secondOrderEnergyCorrection': 0.9975062344139651,
'ecalDigis::makeConfigHists': False,
'hcalDigis::num_back_hcal_layers': 96,
'run': -1, 'mySim::runNumber': 9000, 'hcalDigis::num_side_tb_hcal_layers': 28, 'trigScintDigis::number_of_arrays': 1, 'hcalDigis::readoutThreshold': 1, 'trigScintDigis::mev_per_mip': 0.4, 'ecalRecon::digiPassName': '', 'hcalDigis::pe_per_mip': 68.0, 'trigScintDigis::output_collection': 'trigScintDigisTag', 'hcalDigis::mev_per_mip': 4.66, 'ecalDigis::gain': 2000.0, 'ecalDigis::pedestal': 1100.0, 'ecalDigis::nADCs': 10, 'hcalDigis::super_strip_size': 1, 'mySim::verbosity': 1, 'ecalDigis::readoutThreshold': 4.0, 'hcalDigis::strip_position_resolution': 150.0, 'hcalDigis::strips_side_lr_per_layer': 12}
Why not just come up with a list of metadata variables that you want to include and just query RUCIO for them? Each processor and user action will have their own set of parameters and not all of them are used depending on how you run. So it's difficult to expect to code up something generic enough that pushes everything to RUCIO because there will be times where RUCIO won't have the metadata variable available. Unless the metadata is being stored as a string?
Also, what's going to happen if we decide to add another parameter later on? Will this require adding another variable to RUCIO? Again, if you are storing everything as a string, maybe this doesn't matter.
Actually, it might be easiest to dump the structure of the table itself that is being used to store the metadata and share it here (or on slack).
On Thu, Jun 11, 2020, 11:43 AM Tom Eichlersmith notifications@github.com wrote:
Here's the most common example:
The processor Simulator has a list parameter called generators that lists the primary generators to use. One of the generators you could use is the ParticleGun which has its own dictionary of parameters, one of them being direction which is a list defining the vector to point the gun.
The solution I'm doing right now is to have dictionaries inside of dictionaries. An example printout:
{'compressionSetting': 9, 'hcalDigis::strips_side_tb_per_layer': 12, 'mySim::generators': [{'direction': [0.07845, 0, 0.996925], 'name': 'single_4gev_e_upstream_tagger', 'particle': 'e-', 'energy': 4.0, 'position': [-27.926, 0, -700], 'class': 'ldmx::ParticleGun'}], 'hcalDigis::strip_attenuation_length': 5.0, 'ecalDigis::iSOI': 0, 'trigScintDigis::mean_noise': 0.02, 'ecalRecon::digiCollName': 'EcalDigis', 'passName': 'sim', 'mySim::detector': '/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml', 'ecalDigis::noiseIntercept': 700.0, 'trigScintDigis::number_of_strips': 50, 'outputFile0': 'myFirstSim_10_events.root', 'skimDefaultIsKeep': True, 'trigScintDigis::input_collection': 'TriggerPadTaggerSimHits', 'hcalDigis::meanNoise': 0.02, 'hcalDigis::num_side_lr_hcal_layers': 26, 'ecalDigis::padCapacitance': 0.1, 'trigScintDigis::pe_per_mip': 10.0, 'maxEvents': 10, 'ecalRecon::layerWeights': [1.675, 2.724, 4.398, 6.039, 7.696, 9.077, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 13.497, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 8.99], 'hcalDigis::strips_back_per_layer': 60, 'ecalDigis::noiseSlope': 25.0, 'ecalRecon::secondOrderEnergyCorrection': 0.9975062344139651, 'ecalDigis::makeConfigHists': False, 'hcalDigis::num_back_hcal_layers': 96, 'run': -1, 'mySim::runNumber': 9000, 'hcalDigis::num_side_tb_hcal_layers': 28, 'trigScintDigis::number_of_arrays': 1, 'hcalDigis::readoutThreshold': 1, 'trigScintDigis::mev_per_mip': 0.4, 'ecalRecon::digiPassName': '', 'hcalDigis::pe_per_mip': 68.0, 'trigScintDigis::output_collection': 'trigScintDigisTag', 'hcalDigis::mev_per_mip': 4.66, 'ecalDigis::gain': 2000.0, 'ecalDigis::pedestal': 1100.0, 'ecalDigis::nADCs': 10, 'hcalDigis::super_strip_size': 1, 'mySim::verbosity': 1, 'ecalDigis::readoutThreshold': 4.0, 'hcalDigis::strip_position_resolution': 150.0, 'hcalDigis::strips_side_lr_per_layer': 12}
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LDMX-Software/ldmx-sw/issues/801#issuecomment-642863011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4JMXAMFIXCLGH2PKUF6I3RWEQVBANCNFSM4N3TIINQ .
I have pushed a working solution to branch iss801. A few comments:
Here is an example of the output of print(json.dumps(p.parameterDump(), indent=4))
:
{
"maxEvents": 10,
"skimDefaultIsKeep": true,
"outputFiles": [
"myFirstSim_10_events.root"
],
"run": -1,
"compressionSetting": 9,
"sequence": [
{
"name": "mySim",
"runNumber": 9000,
"verbosity": 0,
"generators": [
{
"direction": [
0.07845,
0,
0.996925
],
"name": "single_4gev_e_upstream_tagger",
"particle": "e-",
"energy": 4.0,
"position": [
-27.926,
0,
-700
],
"class": "ldmx::ParticleGun"
}
],
"detector": "/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml",
"class": "ldmx::Simulator"
},
{
"makeConfigHists": false,
"name": "ecalDigis",
"iSOI": 0,
"noiseSlope": 25.0,
"padCapacitance": 0.1,
"nADCs": 10,
"gain": 2000.0,
"readoutThreshold": 4.0,
"pedestal": 1100.0,
"class": "ldmx::EcalDigiProducer",
"noiseIntercept": 700.0
},
{
"name": "ecalRecon",
"digiCollName": "EcalDigis",
"digiPassName": "",
"secondOrderEnergyCorrection": 0.9975062344139651,
"layerWeights": [
1.675,
2.724,
4.398,
6.039,
7.696,
9.077,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
9.63,
13.497,
17.364,
17.364,
17.364,
17.364,
17.364,
17.364,
17.364,
17.364,
17.364,
8.99
],
"class": "ldmx::EcalRecProducer"
},
{
"strips_side_tb_per_layer": 12,
"num_back_hcal_layers": 96,
"randomSeed": 1,
"name": "hcalDigis",
"meanNoise": 0.02,
"pe_per_mip": 68.0,
"strip_position_resolution": 150.0,
"super_strip_size": 1,
"num_side_tb_hcal_layers": 28,
"strips_side_lr_per_layer": 12,
"strips_back_per_layer": 60,
"readoutThreshold": 1,
"num_side_lr_hcal_layers": 26,
"mev_per_mip": 4.66,
"sim_hit_pass_name": "",
"class": "ldmx::HcalDigiProducer",
"strip_attenuation_length": 5.0
},
{
"number_of_strips": 50,
"randomSeed": 1,
"name": "trigScintDigis",
"input_collection": "TriggerPadUpSimHits",
"pe_per_mip": 10.0,
"output_collection": "trigScintDigisUp",
"number_of_arrays": 1,
"mean_noise": 0.02,
"mev_per_mip": 0.4,
"class": "ldmx::TrigScintDigiProducer"
},
{
"number_of_strips": 50,
"randomSeed": 1,
"name": "trigScintDigis",
"input_collection": "TriggerPadDownSimHits",
"pe_per_mip": 10.0,
"output_collection": "trigScintDigisDn",
"number_of_arrays": 1,
"mean_noise": 0.02,
"mev_per_mip": 0.4,
"class": "ldmx::TrigScintDigiProducer"
},
{
"number_of_strips": 50,
"randomSeed": 1,
"name": "trigScintDigis",
"input_collection": "TriggerPadTaggerSimHits",
"pe_per_mip": 10.0,
"output_collection": "trigScintDigisTag",
"number_of_arrays": 1,
"mean_noise": 0.02,
"mev_per_mip": 0.4,
"class": "ldmx::TrigScintDigiProducer"
}
],
"skimRules": [],
"inputFiles": [],
"passName": "sim",
"keep": []
}
@tomeichlersmith Please open up a PR draft so it's easier to see the changes.
Closing this since we have a workable solution of dumping the parameters into a full dictionary which can then be used outside of fire
to upload to rucio (or do whatever).
We would like the option for the parameters used in a processing run to be automatically shared with RUCIO so that ldmx-sw is more easily integrated with LDCS. This feature would require a few things:
getParameter
functions, one that has no default and throws and exception, one that has a default and is silent. This is how it worked before, and would encourage users to realize that if they provide a default in the C++ the parameter may or may not be tracked in the meta-data.Besides the last bullet point, all of these changes are done in the python modules of ldmx-sw.
@omar-moreno @bryngemark