Boavizta / boaviztapi

🛠 Giving access to BOAVIZTA reference data and methodologies trough a RESTful API
GNU Affero General Public License v3.0
66 stars 23 forks source link

Improving the reporting of uncertainties in the calculation made by boaviztapi #214

Open PierreRust opened 9 months ago

PierreRust commented 9 months ago

Problem

I believe we should improve the way we handle uncertainty in the calculation made in boaviztapi. This issue is mainly meant to start the discussion on how that could be improved.

When using the API to request the impact of a server, vm etc., figures are given with a high number of digits, a significant_figures parameter, and a min and max value.

For example :

    "gwp": {
      "embedded": {
        "value": 636.11,
        "significant_figures": 5,
        "min": 252.48,
        "max": 2010.6,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },

Based on the discussion I had with @da-ekchajzer on mattermost

I the example above, I think the value 636.11 does not really have 5 significant digits and, given the min and max the api returns for the impact, we should apply some rounding. We can probably say that the gwp impact for this server is around 600 kgCO2e, but 636,11 is clearly too precise ;)

Solution

Regarding the rounding I suggest using something like the function bellow (rough code, needs polishing !) : it rounds the value based on the delta between the min and max value returned and a precisionparameter (with is a %) ; if there is a large difference between min and max, the rounding is more aggressive.

For example :

def round_value(val, min_val, max_val, precision):
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    rounded = round(val / 10 ** significant) * 10**significant
    return rounded

Alternatives

This approach helps solving the rounding issue, something else is needed for the uncertainties coming on the references data used for the calculation.

da-ekchajzer commented 9 months ago

Thanks for this proportion. Using min/max and a log10 function is a great way of handling the very different figures we have. Here are my comments :

def round_value(val, min_val, max_val, precision=config["default_precision"]):
    """
   Rounds the value based on the delta between the min and max value returned and a precision parameter
   """
    # value for precision% of the min max delta
    approx = (max_val - min_val) / (100/precision)
    significant = math.floor(math.log10(approx))
    return float(to_precision(val, significant))

I can make a PR with this implementation if you think it's ok.

da-ekchajzer commented 9 months ago

I believe there is an issue with the approx variable. The more difference there is between min and max, the bigger approx is and the number of significant figures. I believe we want the opposite. See some examples :

value approx min_val max_val log10(approx)
20.29217 7.0896360000000005 10.91891 81.81527 0
1819.821672 530.9138167884 110.14710120000001 5419.285269084 2
0.02040328338 2.0873039999997484e-06 0.020400523740000003 0.02042139678 -6
0.00030760589391948 0.0001208433425274 6.3406418256e-05 0.00127183984353 -4
306.0165 95.3682 179.92950000000002 1133.6115 1
61648.853641199996 224191.01528027997 62.2570572 2241972.40986 5
PierreRust commented 9 months ago

I think the issue comes from using the significant as a parameter forto_precisionmethod . See for exemple with a value of6.2301with min=3.617and max=10.023` (and 10% precision)

round_value (original) for 3.617 < 6.2301 < 10.023 precision 10% = 6.2
round_value (with to_precision) for 3.617 < 6.2301 < 10.023 precision 10% = 0.0

The second option is obviously wrong, it is not even in the range between min and max !

The confusion probably come from the naming of my significantvariable which is not really the number of significant figures, at least not the way it is calculated by significant_number(x) https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L9C18-L9C18

I'll make a PR with a set of test cases, it will be easier to discuss the implementation

PierreRust commented 9 months ago

I've added a PR with a rouding function that handles corner cases : #220

da-ekchajzer commented 9 months ago

I am currently implementing the function in the code.

I think that there is a problem when min=max=value. When so, we don't round at all, even though it gives a precision which is way too high compare to the uncertainty of the impact factors.

Example

 "gwp": {
      "embedded": {
        "value": 23.77907,
        "min": 23.77907,
        "max": 23.77907,
        "warnings": [
          "End of life is not included in the calculation"
        ]
      },
      "use": {
        "value": 243.69843455999998,
        "min": 243.69843455999998,
        "max": 243.69843455999998
      },
      "unit": "kgCO2eq",
      "description": "Total climate change"
    },

Solutions

To account for the uncertainty of the impact factors, we could :

  1. Hard code a maximal sig_fig numbers
  2. Apply a ratio of x% to the maximal and minimal, which correspond to the uncertainty of the impact factor.

What are your thought about it ?

PierreRust commented 9 months ago

yes, as the rounding is based on the difference between min and max, when min == max it does nothing (and it's by design ;) )

I believe the "right way" to handle this would be to specify an uncertainty on the base impact factor and propagate it when calculating the overall impact (the uncertainties python library could help here https://pythonhosted.org/uncertainties/ ) . This should probably be another issue and PR, for the next version.

For now, I think we should fall back, in this specific case, to the current method where we simply cut after a fixed number of sig_fig.