Improving the reporting of uncertainties in the calculation made by boaviztapi

PierreRust commented 9 months ago

Problem

I believe we should improve the way we handle uncertainty in the calculation made in boaviztapi. This issue is mainly meant to start the discussion on how that could be improved.

When using the API to request the impact of a server, vm etc., figures are given with a high number of digits, a significant_figures parameter, and a min and max value.

For example :

    "gwp": {
      "embedded": {
        "value": 636.11,
        "significant_figures": 5,
        "min": 252.48,
        "max": 2010.6,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },

Based on the discussion I had with @da-ekchajzer on mattermost

the min and max value are calculated based on the uncertainties with have on the configuration of the equipment. For example, for a server if the request did not specify the exact amount of RAM we calculate the impact for a high and a low amount of RAM.
the uncertainties coming from the reference data is not represented. e.g. we could consider that the value of the impact of 1GB of RAM or 1mm2 of die has an uncertainty of 10%, but we do not account explicitly for that in the result.
the value of the significant_figures is not really calculated but mostly comes from a configuration file.

I the example above, I think the value 636.11 does not really have 5 significant digits and, given the min and max the api returns for the impact, we should apply some rounding. We can probably say that the gwp impact for this server is around 600 kgCO2e, but 636,11 is clearly too precise ;)

Solution

Regarding the rounding I suggest using something like the function bellow (rough code, needs polishing !) : it rounds the value based on the delta between the min and max value returned and a precisionparameter (with is a %) ; if there is a large difference between min and max, the rounding is more aggressive.

For example :

Approx for 242.48 < 636.11 < 2016.6 precision 10% = 600
Approx for 242.48 < 636.11 < 2016.6 precision 1% = 640

def round_value(val, min_val, max_val, precision):
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    rounded = round(val / 10 ** significant) * 10**significant
    return rounded

Alternatives

This approach helps solving the rounding issue, something else is needed for the uncertainties coming on the references data used for the calculation.

da-ekchajzer commented 9 months ago

Thanks for this proportion. Using min/max and a log10 function is a great way of handling the very different figures we have. Here are my comments :

The rounding function is not always working. I think should keep the existing one which handle all cases : https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L33
Precision could be set by default from the config file and override if needed (I don't see the case for now).

def round_value(val, min_val, max_val, precision=config["default_precision"]):
    """
   Rounds the value based on the delta between the min and max value returned and a precision parameter
   """
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    return float(to_precision(val, significant))

I can make a PR with this implementation if you think it's ok.

da-ekchajzer commented 9 months ago

I believe there is an issue with the approx variable. The more difference there is between min and max, the bigger approx is and the number of significant figures. I believe we want the opposite. See some examples :

value	approx	min_val	max_val	log10(approx)
20.29217	7.0896360000000005	10.91891	81.81527	0
1819.821672	530.9138167884	110.14710120000001	5419.285269084	2
0.02040328338	2.0873039999997484e-06	0.020400523740000003	0.02042139678	-6
0.00030760589391948	0.0001208433425274	6.3406418256e-05	0.00127183984353	-4
306.0165	95.3682	179.92950000000002	1133.6115	1
61648.853641199996	224191.01528027997	62.2570572	2241972.40986	5

PierreRust commented 9 months ago

I think the issue comes from using the significant as a parameter forto_precisionmethod . See for exemple with a value of6.2301with min=3.617and max=10.023` (and 10% precision)

round_value (original) for 3.617 < 6.2301 < 10.023 precision 10% = 6.2
round_value (with to_precision) for 3.617 < 6.2301 < 10.023 precision 10% = 0.0

The second option is obviously wrong, it is not even in the range between min and max !

The confusion probably come from the naming of my significantvariable which is not really the number of significant figures, at least not the way it is calculated by significant_number(x) https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L9C18-L9C18

I'll make a PR with a set of test cases, it will be easier to discuss the implementation

PierreRust commented 9 months ago

I've added a PR with a rouding function that handles corner cases : #220

da-ekchajzer commented 9 months ago

I am currently implementing the function in the code.

I think that there is a problem when min=max=value. When so, we don't round at all, even though it gives a precision which is way too high compare to the uncertainty of the impact factors.

Example

 "gwp": {
      "embedded": {
        "value": 23.77907,
        "min": 23.77907,
        "max": 23.77907,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },
      "use": {
        "value": 243.69843455999998,
        "min": 243.69843455999998,
        "max": 243.69843455999998
      },
      "unit": "kgCO2eq",
      "description": "Total climate change"
    },

Solutions

To account for the uncertainty of the impact factors, we could :

Hard code a maximal sig_fig numbers
Apply a ratio of x% to the maximal and minimal, which correspond to the uncertainty of the impact factor.

What are your thought about it ?

PierreRust commented 9 months ago

yes, as the rounding is based on the difference between min and max, when min == max it does nothing (and it's by design ;) )

I believe the "right way" to handle this would be to specify an uncertainty on the base impact factor and propagate it when calculating the overall impact (the uncertainties python library could help here https://pythonhosted.org/uncertainties/ ) . This should probably be another issue and PR, for the next version.

For now, I think we should fall back, in this specific case, to the current method where we simply cut after a fixed number of sig_fig.

Boavizta / boaviztapi