cms-nanoAOD / correctionlib

A generic correction library
https://cms-nanoaod.github.io/correctionlib/
BSD 3-Clause "New" or "Revised" License
16 stars 22 forks source link

Problem creating new json files with correctionlib schema v2 #218

Closed TizianoBevilacqua closed 9 months ago

TizianoBevilacqua commented 11 months ago

Hi all, I have a problem creating new json files with correctionlib schema v2, I used to use this (working in the past) code to create a multibinned SF json, but now I get an error regarding a missing default field:

    def multibinning(inputs_: list, edges_: list, content_, flow_: str):
        return cs.MultiBinning(
            nodetype="multibinning",
            inputs=inputs_,
            edges=edges_,
            content=content_,
            flow=flow_,
    )

    inputs_ = ["SCeta", "r9"]
    edges_ = [[0.0, 1., 1.5, 999.0], [0.0, 0.94, 999.0]]
    content_ = {
        "nominal": [1., 2., 3., 4., 5., 6.],
        "up": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6],
        "down": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6],
    }
    flow_ = "clamp"

    MAT = cs.Correction(
        name="MaterialCentralBarrel",
        version=1,
        inputs=[
            cs.Variable(
                name="systematic", type="string", description="Systematic variation"
            ),
            cs.Variable(name="SCeta", type="real", description="Photon Super Cluster eta"),
            cs.Variable(
                name="r9",
                type="real",
                description="Photon full 5x5 R9",
            ),
        ],
        output=cs.Variable(
            name="Ecorr",
            type="real",
            description="Multiplicative correction to photon energy and pt",
        ),
        data=cs.Category(
            nodetype="category",
            input="systematic",
            content=[
                    {
                        "key": "nominal",
                        "value": multibinning(inputs_, edges_, content_["nominal"], flow_),
                    },
                    {
                        "key": "up",
                        "value": multibinning(inputs_, edges_, content_["up"], flow_),
                    },
                    {
                        "key": "down",
                        "value": multibinning(inputs_, edges_, content_["down"], flow_),
                    },
                ],
        ),
    )

I've tried to add the missing fields pydantic was complaining about and I arrived to this version of the same function:

  MAT = cs.Correction(
        name="MaterialCentralBarrel",
        description="MaterialCentralBarrel correction",
        generic_formulas=None,
        version=1,
        inputs=[
            cs.Variable(
                name="systematic", type="string", description="Systematic variation"
            ),
            cs.Variable(name="SCEta", type="real", description="Photon Super Cluster eta"),
            cs.Variable(
                name="r9",
                type="real",
                description="Photon full 5x5 R9, ratio E3x3/ERAW, where E3x3 is the energy sum of the 3 by 3 crystals surrounding the supercluster seed crystal and ERAW is the raw energy sum of the supercluster",
            ),
        ],
        output=cs.Variable(
            name="Ecorr",
            type="real",
            description="Multiplicative correction to photon energy and pt",
        ),
        data=cs.Category(
            nodetype="category",
            input="systematic",
            content=[
                cs.CategoryItem(
                    key="up",
                    value=multibinning(inputs_, edges_, content_["up"], flow_)
                ),
                cs.CategoryItem(
                    key="down",
                    value=multibinning(inputs_, edges_, content_["down"], flow_)
                ),
            ],
            default=multibinning(inputs_, edges_, content_["nominal"], flow_),
        ),
    )

This solves the creation of the correction (it also prints out just fine), but if I try to convert it with to_evaluator() I get again an error:

pydantic_core._pydantic_core.ValidationError: 2 validation errors for CorrectionSet
description
  Field required [type=missing, input_value={'schema_version': 2, 'co... 6.0], flow='clamp')))]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
compound_corrections
  Field required [type=missing, input_value={'schema_version': 2, 'co... 6.0], flow='clamp')))]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

Ultimately I've changed this line in schemav2.py:

cset = CorrectionSet(schema_version=VERSION, corrections=[self]),

to this:

cset = CorrectionSet(schema_version=VERSION, corrections=[self], description=self.description, compound_corrections=None)

and it doesn't complain anymore, it is not a good fix but at least pinpoint the problem for me

nsmith- commented 11 months ago

Can you check what pydantic version you have installed? e.g.

import pydantic
print(pydantic.__version__)
TizianoBevilacqua commented 11 months ago

v 2.4.2

nsmith- commented 11 months ago

Ok. This is likely the reason, and a bit puzzling since in the installation we restrict to pydantic < 2 due to not migrating the code to be compatible: https://github.com/cms-nanoAOD/correctionlib/blob/42a327e093d43b2cbc8c2a7c0fed69a430c482fb/setup.cfg#L32

In what environment are you installing correctionlib? Does the pip install warn about incompatibilities?

TizianoBevilacqua commented 11 months ago

I am using a coffea/higgsDNA environment and I wasn't specifying any requirement for pydantic. If I reinstall it with pydantic <2,>=1.7.3 it seems to work just fine. Thank you :)

nsmith- commented 11 months ago

Ok, so the problem's origin is understood, but I don't see why the installation did not respect the version pinning in correctionlib. Regardless, we need to make correctionlib pydantic2-compatible.

nsmith- commented 10 months ago

Actually I'm a bit puzzled because even in v2.4.2 and the latest correctionlib I can't recreate this exact error.