flav-io / flavio

A Python package for flavour physics phenomenology in the Standard model and beyond
http://flav-io.github.io/
MIT License
71 stars 62 forks source link

Refactoring of `Parameter` and `ParameterConstraints` classes #124

Open peterstangl opened 4 years ago

peterstangl commented 4 years ago

This issue is based on the discussion in PR #123:

@MJKirk:

Yes, something exactly like AwareDict would work - a ParameterCitationsAwareDict as it were (a snappier name might be in order).

Regarding the inspire info for the parameters, currently the parameter data is split over 3 files: parameters_uncorrelated, parameters_correlated, and parameter_metadata. We could just put the inspire key into the metadata file, but this seems bad since the references could easily not be updated when the numbers get changed. It seems like it would be better (if more complicated) to merge all these into a single parameter file, much like the single measurements file, and then it will be obvious to update the inspire at the same time as the numerics. (Actually, it looks like very early on it used to be this way and then the file was split in b30aa73, do you remember why you did this @DavidMStraub?)

@DavidMStraub:

Wow, I don't remember that it has ever been a single file.

Splitting them I guess is motivated by metadata populating the attributes of Parameter instances - which are meant not to change - while values populate the ParameterConstraints instances, which can be swapped.

From this angle, it makes sense for citations not to be part of the metadata, but of the ParameterConstraints.

Unfortunately, it will not be entirely trivial to refactor ParameterConstraints to have citations associated with constraints. The implementation with the _constraints list and _parameters dict is suboptimal. If I would implement it today, I would probably add a Constraint class, that associates a single ProbabilityDistrubution with one or multiple parameters. This could then have a citation attribute. The Constraints class would then basically be a container class with a list of Constraints, and various helper methods, e.g. to make sure the collection of constraints is consistent.

I actually find the implementation with the two classes Parameter and ParameterConstraints a bit confusing for the user and I think it might be more intuitive if the Parameter class would be analogous to Measurement, i.e. if it would be itself a subclass of Constraints and one would completely drop the ParameterConstraints class. Even though the value of a parameter can change while its metadata stays the same and this is not true for a measurement, this different treatment of Measurement and Parameter could be implemented on the level of these two classes. Then one could again merge the three parameter YAML files into one file analogous to measurements.yaml, which I think would also be more intuitive.

I think it is actually a very good idea then to use a Constraint class instead of the _parameters dict and the _constraints list. This might make it also easier to access the ProbabilityDistrubution, e.g. for plotting experimental data.

Apart from this, I was also thinking about whether the functions in falvio.measurements and flavio.parameters could be made class methods of the Measurement and Parameter classes. This might also be more intuitive since at least I always got confused by the fact that there is both flavio.Measurement and flavio.measurements as well as flavio.Parameter and flavio.parameters.

@DavidMStraub what do you think?

DavidMStraub commented 4 years ago

I actually find the implementation with the two classes Parameter and ParameterConstraints a bit confusing for the user and I think it might be more intuitive if the Parameter class would be analogous to Measurement, i.e. if it would be itself a subclass of Constraints and one would completely drop the ParameterConstraints class. Even though the value of a parameter can change while its metadata stays the same and this is not true for a measurement, this different treatment of Measurement and Parameter could be implemented on the level of these two classes.

I agree that it's confusing; but Measurement is essentially a collection of ProbabilityDistributions associated with Observables; a ParameterConstraints (I always hated that name) a collection of ProbabilityDistributions associated with Parameters. So just like the Observable is something that exists even without a measurement, the Parameter is something that exists without a constraint. Merging Parameter and ParameterConstraint to me seems more like merging Measurement and Observable.

But the analogy with Measurements is perhaps still useful. The main difference between ParameterConstraints and Measurement is that, at the moment, there is only a single instance of the former in every instance, e.g. default_parameters. One could move to a scheme where they are treated more like Measurement where there is a single file with many different "parameter collections", that would then each have a distinct citation associated with them.

Apart from this, I was also thinking about whether the functions in falvio.measurements and flavio.parameters could be made class methods of the Measurement and Parameter classes. This might also be more intuitive since at least I always got confused by the fact that there is both flavio.Measurement and flavio.measurements as well as flavio.Parameter and flavio.parameters.

Yes, fully agree, parameters.py and measurements.py are a horrible mess.

peterstangl commented 4 years ago

I agree that it's confusing; but Measurement is essentially a collection of ProbabilityDistributions associated with Observables; a ParameterConstraints (I always hated that name) a collection of ProbabilityDistributions associated with Parameters. So just like the Observable is something that exists even without a measurement, the Parameter is something that exists without a constraint. Merging Parameter and ParameterConstraint to me seems more like merging Measurement and Observable.

But the analogy with Measurements is perhaps still useful. The main difference between ParameterConstraints and Measurement is that, at the moment, there is only a single instance of the former in every instance, e.g. default_parameters. One could move to a scheme where they are treated more like Measurement where there is a single file with many different "parameter collections", that would then each have a distinct citation associated with them.

Yes, I think this is a very good idea! One could e.g. have one "parameter collection" for each correlated set of parameters and one collection each for the uncorrelated parameters from the PDG, from FLAG, etc.. And each of these collections could be implemented exactly like a measurement and would have an inspire ID (and one could rename the key "experiment" into "collaboration" to make it apply to both measurements and parameter collections).

Then one could also rename ParameterConstraints into ParameterCollection. And the parameters in a ParameterCollection (like the measured observables in a Measurement) could still correspond to instances of a new Constraint class (that associates a single ProbabilityDistrubution with one or multiple parameters or measured observales).