HEPData / hepdata_lib

Library for getting your data into HEPData
https://hepdata-lib.readthedocs.io
MIT License
15 stars 37 forks source link

Allow dummy uncertainties for variable member with dummy (string) value #247

Open wmtford opened 5 months ago

wmtford commented 5 months ago

In constructing HEPData tables, sometimes we have distributions in which no actual information is available for a subset of the dependent variables in some bins. One solution I’ve used for these is to enter a string, like ‘---’ for the value and zero for the uncertainty. This works, but with lots of warnings about zero error values. Would it be possible to protect string values of the uncertainties, as is done for the central values? Then one could enter a similar dummy indicator string for those.

Of course, the consumer of the table has to trap any special dummy contents in the “values” and “uncertainties” fields. I don’t know if there are any guidelines for conventions to make this uniform.

clelange commented 5 months ago

Hi Bill, thanks for creating this issue. Do you have an example of how you solved this that you could share so that it's easier to test and reproduce (and then to change eventually)?

GraemeWatt commented 5 months ago

Thanks for the feedback, but HEPData doesn't support string values for the uncertainties. Recommendations for how to encode missing bins are given in:

https://hepdata-submission.readthedocs.io/en/latest/data_yaml.html#uncertainties (second paragraph)

The implementation in hepdata_lib was done in PR #161, i.e. if all uncertainties are zero for a particular bin, then the errors key is omitted from the YAML output. The warning message was my suggestion to discourage empty bins if there is only one dependent variable. But for your use case where there are multiple dependent variables and a (different) subset of bins are empty for each dependent variable, it is legitimate to set uncertainties to zero for those bins and you should just ignore the warnings from hepdata_lib.

I don't think the behaviour of the hepdata_lib code needs to be changed, but there could be better documentation and perhaps an option could be added to suppress the warnings. I'll try to address these two points and I'll leave this issue open until they are addressed.

wmtford commented 5 months ago

Thanks both. An example of the use case is Fig. 5, the second table, in [1]. A stand-alone notebook to reproduce that table, along with the needed input files, can be found in cernbox at [2]. From my perspective this works fine, other than that the warning messages seem to imply a problem.

The reason that the encoding of missing bins pointed to by Graeme doesn't work in the hepdata_lib implementation is that there uncertaintes are added, or not, for an entire column; we don't have access to treat some of the rows differently.

[1] https://www.hepdata.net/record/146018 [2] https://cernbox.cern.ch/s/8Nk392EQuLX8jJC

GraemeWatt commented 5 months ago

Bill, thank you for providing the detailed example. Unfortunately, it is a limitation of hepdata_lib that it is not possible to use a different treatment of uncertainties for different rows of a dependent variable. Your existing treatment is the best that can be done with the current code. I've opened a PR #251 that adds a paragraph of explanation to the end of the Uncertainties section of the documentation. I also added an option zero_uncertainties_warning (default value True) to the Variable class. In your example notebook, you could suppress the warnings using:

    # Dependent variable
    gy = Variable(axTitles[1][0], is_independent=False, is_binned=False, units=axTitles[1][1], zero_uncertainties_warning=False)

I also added a test that the errors key will be omitted if the uncertainties are zero, which should have been added when PR #161 was completed. Thanks again for the feedback.