KIT-CMS / CROWN

C++ based ROOT Workflow for N-tuples (CROWN)
https://crown.readthedocs.io
MIT License
4 stars 21 forks source link

ReplaceProducer doesn't add outputs if outputs of old and new producer are different #279

Open moritzmolch opened 1 month ago

moritzmolch commented 1 month ago

I want to replace a ProducerGroup with ReplaceProducer for a scope (in this example the mm scope). However, when I do that the outputs of the new producer are not showing up in the list of all outputs (the list of branch names in the output file of the respective scope.

The old producer is

BasicBJetQuantities = ProducerGroup(
    name="BasicBJetQuantities",
    call=None,
    input=None,
    output=None,
    scopes=["mt", "et", "tt", "em", "mm", "ee"],
    subproducers=[
        NumberOfBJets,
        NumberOfBJets_boosted,
    ],
)  # outputs nbtag, nbtag_boosted

with outputs nbtag and nbtag_boosted and the new producer is

BasicBJetQuantitiesMuMu = ProducerGroup(
    name="BasicBJetQuantitiesMuMu",
    call=None,
    input=None,
    output=None,
    scopes=["mm"],
    subproducers=[
        NumberOfBJets,
    ],
)  # outputs nbtag

with the output nbtag. The modification in the main config file is declared with

configuration.add_modification_rule(
    ["mm"],
    ReplaceProducer(
        producers=[jets.BasicBJetQuantities, jets.BasicBJetQuantitiesMuMu],
        samples=available_sample_types,
    ),
)

The problem is now that I cannot find nbtag as branch in the output file of the mm scope, although it is an output of the new producer. I also declared it with the configuration.add_outputs explicitly as output variable in the mm scope.

Also, the debug output of the cmake compilation of the configuration shows that the output nbtag is removed. The variable nbtag_boosted is not appearing here as I didn't declare it as an output of the mm scope in the main configuration, so I guess this part is okay. However, there is no line saying that nbtag is added back:

...
[DEBUG   ] [rules.py           :303] scopes: ['mm']
[DEBUG   ] [rules.py           :304] Producers to be replaced: [ProducerGroup: BasicBJetQuantities, ProducerGroup: BasicBJetQuantitiesMuMu]
[DEBUG   ] [rules.py           :309] ReplaceProducer: Replace ProducerGroup: BasicBJetQuantities from producer in scope mm with ProducerGroup: BasicBJetQuantitiesMuMu
[DEBUG   ] [rules.py           :387] ReplaceProducer: Removing nbtag from outputs in scope mm
[CRITICAL] [rules.py           :130] Applying rule ProducerRule - replace ProducerGroup: BasicJetQuantities with ProducerGroup: BasicJetQuantitiesMuMu for ['ggh_htautau', 'ggh_hbb', 'vbf_htautau', 'vbf_hbb', 'rem_htautau', 'rem_hbb', 'embedding', 'embedding_mc', 'singletop', 'ttbar', 'diboson', 'dyjets', 'wjets', 'data', 'electroweak_boson', 'nmssm_Ybb', 'nmssm_Ytautau'] in scopes ['mm'] for sample dyjets
...

I think the problem lies in this else branch of the update_outputs method of the ReplaceProducer class: https://github.com/KIT-CMS/CROWN/blob/92b2187b12bb84b96326458b41203ca178c95eb4/code_generation/rules.py#L383-L400 Line 394 should be changed from

if added_output in outputs_to_be_updated[scope]:

to

if added_output not in outputs_to_be_updated[scope]:

right? If I apply this change, I get the following output, which is what I would expect:

[DEBUG   ] [rules.py           :303] scopes: ['mm']
[DEBUG   ] [rules.py           :304] Producers to be replaced: [ProducerGroup: BasicBJetQuantities, ProducerGroup: BasicBJetQuantitiesMuMu]
[DEBUG   ] [rules.py           :309] ReplaceProducer: Replace ProducerGroup: BasicBJetQuantities from producer in scope mm with ProducerGroup: BasicBJetQuantitiesMuMu
[DEBUG   ] [rules.py           :387] ReplaceProducer: Removing nbtag from outputs in scope mm
[DEBUG   ] [rules.py           :395] ReplaceProducer: Adding nbtag from outputs in scope mm
[CRITICAL] [rules.py           :130] Applying rule ProducerRule - replace ProducerGroup: BasicFatJetQuantities with ProducerGroup: BasicFatJetQuantitiesMuMu for ['ggh_htautau', 'ggh_hbb', 'vbf_htautau', 'vbf_hbb', 'rem_htautau', 'rem_hbb', 'embedding', 'embedding_mc', 'singletop', 'ttbar', 'diboson', 'dyjets', 'wjets', 'data', 'electroweak_boson', 'nmssm_Ybb', 'nmssm_Ytautau'] in scopes ['mm'] for sample dyjets

Finally, nbtag is then also a branch of the output file, so this change should fix the problem.

I think that this bug only affects producers, which have different sets of outputs. In that case, the if-else query goes to the if added_outputs == removed_outputs branch, which does not modify the output set.

moritzmolch commented 1 month ago

I added the proposed fix as a pull request: https://github.com/KIT-CMS/CROWN/pull/280 Sorry for the spam, should've packed it in one thread. :sweat_smile:

harrypuuter commented 1 month ago

Hi,

thanks for the report. This is the way the ReplaceProducer is supposed to work (or at least sort of). There are still the RemoveProducer and AppendProducer functions that should be used if the outputs are different after replacement. ReplaceProducer should only be used, if the new producer has the same outputs as the old one, to avoid confusion and unintended Replacements. I agree that the current implementation is not ideal, and you case should raise an error with a hint to use a Remove/Append combination instead.