Precision and number of significant digits

HEPData / hepdata_lib

Library for getting your data into HEPData

https://hepdata-lib.readthedocs.io

MIT License

15 stars 38 forks source link

Precision and number of significant digits #113

Open claudecharlot opened 4 years ago

claudecharlot commented 4 years ago

Hello,

Which tools do we have in hepdata lib to limite precision in number of decimal/number of significant digits according to CMS rules, in particular when data are retreived from histograms? Ideally we would need the following I think:

possibility to round numbers
possibility to limit the number of decimal
possibility to limit the number of significant digits The last could be obtained by having the possibility to quote numbers in EE format and them limit the number of figures in the mantissa.

Cheers, Claude

AndreasAlbert commented 4 years ago

Hi @claudecharlot,

currently, each Variable object has a digits setting (see here) that you can set after creating the variable:

var = Variable(...)
var.digits = 2 # Example

If you don't set it, it defaults to five significant digits.

Based on the value of this setting, the values of the variable and its associated uncertainties are automatically rounded to the corresponding number of significant digits when you export everything to yaml. The rounding functionality is implemented here. So if you want to control significant digits, I think you will find helpful.

If you really want to control decimal places independent of the size of the number, there is currently no implementation. Do you need this? I have a hard time thinking of a case where one would want to do that rather than using significant digits relative to the size of the number itself.

claudecharlot commented 4 years ago

Hi Andreas,

Yes defining the number of significant digits along with a rounding is just what we need. Limiting the number of decimals was an intermediate suggestion as I did not know you had already implemented the number of significant digits :-) The only exception perhaps is for data counts where we would like to enforce integers rather than floats, that is no decimal at all, but likely this can be done just defining variables as integer.

Thanks, Claude

AndreasAlbert commented 4 years ago

Great! For data, I agree: converting to integer before / while setting the variable values will work fine.

cippy commented 4 years ago

Hi,

would it be possible to round a variable to have the same number of decimals as its associated uncertainty without an 'a priori' knowledge of both numbers?

For example, let's assume I have x = 1.222 +/- 0.035. If I choose to round the uncertainty to 2 significant digits (as the standard CMS prescription, but widely used in HEP) with self.digits=2, the function relative_round() in hepdata_lib/hepdata_lib/helpers.py will produce correctly 0.035, but on the central value it would produce 1.2

The general problem is that often data will be read from large graphs or histograms, where numbers are stored with any number of digits and no rounding. Therefore, it would be nearly impossible to manually deal with all of them to round central numbers accordingly.

Another related topic is also that often the relative precision would not be the same for all points in the histograms, and one would actually round all of them to the same given number of decimals (e.g. 15.3 +/- 1.5 and 12.4 +/- 0.3)

Many thanks,

Marco

clelange commented 4 years ago

Hi Marco (@cippy),

In principle, I guess this could work if, once the values are read in, one looped over the value-uncertainty pairs and then rounded the values individually according to the size of their corresponding uncertainties. Is this something you'd be interested in? If so, would you like to try to implement this? The best way to start would be to think about some tests that one would run to make sure the code does what it's supposed to do.

cippy commented 4 years ago

Hi Clemens,

I might try to come up with some code to do it. I think for the central values one would just need to get their Log10 and compare to Log10 of the uncertainty to figure out to how many significant digits the central value should be rounded. I might not be able to work on that very soon but will try to propose something within next week or so

Marco