LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
22 stars 20 forks source link

Push Set Parameters to RUCIO Meta-Data #801

Closed tomeichlersmith closed 1 year ago

tomeichlersmith commented 4 years ago

We would like the option for the parameters used in a processing run to be automatically shared with RUCIO so that ldmx-sw is more easily integrated with LDCS. This feature would require a few things:

Besides the last bullet point, all of these changes are done in the python modules of ldmx-sw.

@omar-moreno @bryngemark

tomeichlersmith commented 4 years ago

@bryngemark In your python script that pushes the meta data to rucio, you put the meta data into a python dictionary. The parameters for all the processors are already in python dictionaries, so it will be easy to combine them. I am going to attach the processor name to the front of the parameter names so that there isn't a chance of conflicting parameters. e.g. If processor Proc has parameter key with value val, then the meta data would be Proc::key = val.

bryngemark commented 4 years ago

Thanks, Tom!

I think we could do this step-wise, to allow for early testing to tell us more about how we want the ldmx-sw/rucio interface to operate. The first step, in my mind, is publishing all the parameters somewhere. The existing script can then be modified to pick these up and write to rucio as before.

There are use cases like correctly setting up metadata for separated sim and reco, and rereco, that I would like to discuss with the rucio expert in the LDCS group. I have a vision for doing it but would need to understand the technical implementation. I think keeping this a bit modular for now will make for an easier integration down the line. Eventually I in any case think that all the LDCS scripts should be maintained by us and live on our github.

tomeichlersmith commented 4 years ago

Cool! I will have a solution pushed to this branch shortly that does everything not including rucio. (i.e. it will dump everything to a json file instead of trying to push to rucio right away).

bryngemark commented 4 years ago

Yes this was Andrii's point too, that if we can stick to python this makes everything easier. I think prepending everything with the processor name is a good approach.

I also think that for maximum versatility we should implement "unsupervised" writing to rucio. It would be a pain to update all the possible parameter names whenever something changes. But again this is decoupled from ldmx-sw.

tomeichlersmith commented 4 years ago

How do you want to handle parameters that are lists? For example, the preInitCommands for the Simulator is a list of strings passed as commands to Geant4. These could significantly alter the behavior of the sim, so they should be tracked, but I don't know the limits of the meta-data system.

bryngemark commented 4 years ago

Good point. For lists from the mac file, like random seeds, we opted for making them two separate keys. I'm not sure this comes from a fundamental limitation regarding the types of entries allowed in the database. To be followed up! Let's operate under the initial assumption that it is, to see if we can deal with it?

The list that is passed to preInitCommands corresponds to the old mac commands though, right? Perhaps they could simply be parsed the same way as before. So in that case, you would dump them to the json just like any other parameter, and the translating script would need to recognize that this is a bunch of mac commands to parse.

For other lists, I imagine that they are short. Like the list of trigger pad hit collections you'd like to use for tracking, as an example from my own domain. Can we just enumerate them? Much like the random seed solution.

tomeichlersmith commented 4 years ago

I will just enumerate them for now, we can do a more elegant solution if it arises.

bryngemark commented 4 years ago

I was just about to say, just dump everything as lists, and the translating script can do the enumeration/list writing, whatever works on the rucio end. Keeping it modular :)

tomeichlersmith commented 4 years ago

So the central issue is the parameters like actions or generators where each member of the list has its own set of parameters. That is what forces the enumerate vs full lists issue to be apart of the dump command in ldmx-sw.

bryngemark commented 4 years ago

Here's what I thought. You ask in a config file to use a specific generator. It has a set of parameters. Some of them are G4 macro style-commands, which are a list, from the python config perspective. I thought these could be dumped as lists, then.

Are you saying there are additional levels of lists within the lists? Or are you saying that there are many different lists possible, because many different predefined generators exist?

Maybe a concrete example would make this easier for me to understand.

tomeichlersmith commented 4 years ago

Here's the most common example:

The processor Simulator has a list parameter called generators that lists the primary generators to use. One of the generators you could use is the ParticleGun which has its own dictionary of parameters, one of them being direction which is a list defining the vector to point the gun.

The solution I'm doing right now is to have dictionaries inside of dictionaries. An example printout:

{'compressionSetting': 9, 
'hcalDigis::strips_side_tb_per_layer': 12,
# below is the example that I talked about
 'mySim::generators': [{'direction': [0.07845, 0, 0.996925], 'name': 'single_4gev_e_upstream_tagger', 'particle': 'e-', 'energy': 4.0, 'position': [-27.926, 0, -700], 'class': 'ldmx::ParticleGun'}], 
'hcalDigis::strip_attenuation_length': 5.0,
 'ecalDigis::iSOI': 0, 
'trigScintDigis::mean_noise': 0.02, 
'ecalRecon::digiCollName': 'EcalDigis', 
'passName': 'sim',
 'mySim::detector': '/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml', 
'ecalDigis::noiseIntercept': 700.0,
 'trigScintDigis::number_of_strips': 50, 
'outputFile0': 'myFirstSim_10_events.root',
 'skimDefaultIsKeep': True,
 'trigScintDigis::input_collection': 'TriggerPadTaggerSimHits', 
'hcalDigis::meanNoise': 0.02, 
'hcalDigis::num_side_lr_hcal_layers': 26,
 'ecalDigis::padCapacitance': 0.1,
 'trigScintDigis::pe_per_mip': 10.0, 
'maxEvents': 10,
 'ecalRecon::layerWeights': [1.675, 2.724, 4.398, 6.039, 7.696, 9.077, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 13.497, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 8.99], 
'hcalDigis::strips_back_per_layer': 60,
 'ecalDigis::noiseSlope': 25.0, 
'ecalRecon::secondOrderEnergyCorrection': 0.9975062344139651,
 'ecalDigis::makeConfigHists': False,
 'hcalDigis::num_back_hcal_layers': 96,
 'run': -1, 'mySim::runNumber': 9000, 'hcalDigis::num_side_tb_hcal_layers': 28, 'trigScintDigis::number_of_arrays': 1, 'hcalDigis::readoutThreshold': 1, 'trigScintDigis::mev_per_mip': 0.4, 'ecalRecon::digiPassName': '', 'hcalDigis::pe_per_mip': 68.0, 'trigScintDigis::output_collection': 'trigScintDigisTag', 'hcalDigis::mev_per_mip': 4.66, 'ecalDigis::gain': 2000.0, 'ecalDigis::pedestal': 1100.0, 'ecalDigis::nADCs': 10, 'hcalDigis::super_strip_size': 1, 'mySim::verbosity': 1, 'ecalDigis::readoutThreshold': 4.0, 'hcalDigis::strip_position_resolution': 150.0, 'hcalDigis::strips_side_lr_per_layer': 12}
omar-moreno commented 4 years ago

Why not just come up with a list of metadata variables that you want to include and just query RUCIO for them? Each processor and user action will have their own set of parameters and not all of them are used depending on how you run. So it's difficult to expect to code up something generic enough that pushes everything to RUCIO because there will be times where RUCIO won't have the metadata variable available. Unless the metadata is being stored as a string?

Also, what's going to happen if we decide to add another parameter later on? Will this require adding another variable to RUCIO? Again, if you are storing everything as a string, maybe this doesn't matter.

Actually, it might be easiest to dump the structure of the table itself that is being used to store the metadata and share it here (or on slack).

On Thu, Jun 11, 2020, 11:43 AM Tom Eichlersmith notifications@github.com wrote:

Here's the most common example:

The processor Simulator has a list parameter called generators that lists the primary generators to use. One of the generators you could use is the ParticleGun which has its own dictionary of parameters, one of them being direction which is a list defining the vector to point the gun.

The solution I'm doing right now is to have dictionaries inside of dictionaries. An example printout:

{'compressionSetting': 9, 'hcalDigis::strips_side_tb_per_layer': 12, 'mySim::generators': [{'direction': [0.07845, 0, 0.996925], 'name': 'single_4gev_e_upstream_tagger', 'particle': 'e-', 'energy': 4.0, 'position': [-27.926, 0, -700], 'class': 'ldmx::ParticleGun'}], 'hcalDigis::strip_attenuation_length': 5.0, 'ecalDigis::iSOI': 0, 'trigScintDigis::mean_noise': 0.02, 'ecalRecon::digiCollName': 'EcalDigis', 'passName': 'sim', 'mySim::detector': '/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml', 'ecalDigis::noiseIntercept': 700.0, 'trigScintDigis::number_of_strips': 50, 'outputFile0': 'myFirstSim_10_events.root', 'skimDefaultIsKeep': True, 'trigScintDigis::input_collection': 'TriggerPadTaggerSimHits', 'hcalDigis::meanNoise': 0.02, 'hcalDigis::num_side_lr_hcal_layers': 26, 'ecalDigis::padCapacitance': 0.1, 'trigScintDigis::pe_per_mip': 10.0, 'maxEvents': 10, 'ecalRecon::layerWeights': [1.675, 2.724, 4.398, 6.039, 7.696, 9.077, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 9.63, 13.497, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 17.364, 8.99], 'hcalDigis::strips_back_per_layer': 60, 'ecalDigis::noiseSlope': 25.0, 'ecalRecon::secondOrderEnergyCorrection': 0.9975062344139651, 'ecalDigis::makeConfigHists': False, 'hcalDigis::num_back_hcal_layers': 96, 'run': -1, 'mySim::runNumber': 9000, 'hcalDigis::num_side_tb_hcal_layers': 28, 'trigScintDigis::number_of_arrays': 1, 'hcalDigis::readoutThreshold': 1, 'trigScintDigis::mev_per_mip': 0.4, 'ecalRecon::digiPassName': '', 'hcalDigis::pe_per_mip': 68.0, 'trigScintDigis::output_collection': 'trigScintDigisTag', 'hcalDigis::mev_per_mip': 4.66, 'ecalDigis::gain': 2000.0, 'ecalDigis::pedestal': 1100.0, 'ecalDigis::nADCs': 10, 'hcalDigis::super_strip_size': 1, 'mySim::verbosity': 1, 'ecalDigis::readoutThreshold': 4.0, 'hcalDigis::strip_position_resolution': 150.0, 'hcalDigis::strips_side_lr_per_layer': 12}

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LDMX-Software/ldmx-sw/issues/801#issuecomment-642863011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4JMXAMFIXCLGH2PKUF6I3RWEQVBANCNFSM4N3TIINQ .

tomeichlersmith commented 4 years ago

I have pushed a working solution to branch iss801. A few comments:


Here is an example of the output of print(json.dumps(p.parameterDump(), indent=4)):

{
    "maxEvents": 10, 
    "skimDefaultIsKeep": true, 
    "outputFiles": [
        "myFirstSim_10_events.root"
    ], 
    "run": -1, 
    "compressionSetting": 9, 
    "sequence": [
        {
            "name": "mySim", 
            "runNumber": 9000, 
            "verbosity": 0, 
            "generators": [
                {
                    "direction": [
                        0.07845, 
                        0, 
                        0.996925
                    ], 
                    "name": "single_4gev_e_upstream_tagger", 
                    "particle": "e-", 
                    "energy": 4.0, 
                    "position": [
                        -27.926, 
                        0, 
                        -700
                    ], 
                    "class": "ldmx::ParticleGun"
                }
            ], 
            "detector": "/home/tom/ldmx/ldmx-sw/install/data/detectors/ldmx-det-v12/detector.gdml", 
            "class": "ldmx::Simulator"
        }, 
        {
            "makeConfigHists": false, 
            "name": "ecalDigis", 
            "iSOI": 0, 
            "noiseSlope": 25.0, 
            "padCapacitance": 0.1, 
            "nADCs": 10, 
            "gain": 2000.0, 
            "readoutThreshold": 4.0, 
            "pedestal": 1100.0, 
            "class": "ldmx::EcalDigiProducer", 
            "noiseIntercept": 700.0
        }, 
        {
            "name": "ecalRecon", 
            "digiCollName": "EcalDigis", 
            "digiPassName": "", 
            "secondOrderEnergyCorrection": 0.9975062344139651, 
            "layerWeights": [
                1.675, 
                2.724, 
                4.398, 
                6.039, 
                7.696, 
                9.077, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                9.63, 
                13.497, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                17.364, 
                8.99
            ], 
            "class": "ldmx::EcalRecProducer"
        }, 
        {
            "strips_side_tb_per_layer": 12, 
            "num_back_hcal_layers": 96, 
            "randomSeed": 1, 
            "name": "hcalDigis", 
            "meanNoise": 0.02, 
            "pe_per_mip": 68.0, 
            "strip_position_resolution": 150.0, 
            "super_strip_size": 1, 
            "num_side_tb_hcal_layers": 28, 
            "strips_side_lr_per_layer": 12, 
            "strips_back_per_layer": 60, 
            "readoutThreshold": 1, 
            "num_side_lr_hcal_layers": 26, 
            "mev_per_mip": 4.66, 
            "sim_hit_pass_name": "", 
            "class": "ldmx::HcalDigiProducer", 
            "strip_attenuation_length": 5.0
        }, 
        {
            "number_of_strips": 50, 
            "randomSeed": 1, 
            "name": "trigScintDigis", 
            "input_collection": "TriggerPadUpSimHits", 
            "pe_per_mip": 10.0, 
            "output_collection": "trigScintDigisUp", 
            "number_of_arrays": 1, 
            "mean_noise": 0.02, 
            "mev_per_mip": 0.4, 
            "class": "ldmx::TrigScintDigiProducer"
        }, 
        {
            "number_of_strips": 50, 
            "randomSeed": 1, 
            "name": "trigScintDigis", 
            "input_collection": "TriggerPadDownSimHits", 
            "pe_per_mip": 10.0, 
            "output_collection": "trigScintDigisDn", 
            "number_of_arrays": 1, 
            "mean_noise": 0.02, 
            "mev_per_mip": 0.4, 
            "class": "ldmx::TrigScintDigiProducer"
        }, 
        {
            "number_of_strips": 50, 
            "randomSeed": 1, 
            "name": "trigScintDigis", 
            "input_collection": "TriggerPadTaggerSimHits", 
            "pe_per_mip": 10.0, 
            "output_collection": "trigScintDigisTag", 
            "number_of_arrays": 1, 
            "mean_noise": 0.02, 
            "mev_per_mip": 0.4, 
            "class": "ldmx::TrigScintDigiProducer"
        }
    ], 
    "skimRules": [], 
    "inputFiles": [], 
    "passName": "sim", 
    "keep": []
}
omar-moreno commented 4 years ago

@tomeichlersmith Please open up a PR draft so it's easier to see the changes.

tomeichlersmith commented 1 year ago

Closing this since we have a workable solution of dumping the parameters into a full dictionary which can then be used outside of fire to upload to rucio (or do whatever).