Boavizta / boaviztapi

🛠 Giving access to BOAVIZTA reference data and methodologies trough a RESTful API
GNU Affero General Public License v3.0
66 stars 23 forks source link

Add error margin to the impacts evaluation #147

Closed da-ekchajzer closed 9 months ago

da-ekchajzer commented 1 year ago

Problem

Our impact modeling is subject to very large margins of error on two types of values:

To account for the difficulty and imprecision of the assessments and to provide transparency to users, we should report margins of error on the returned impacts.

Evaluate error margin

How should we evaluate both error margin from technical characteristics and impacts factors ?

We should think more on how to implement this feature. Any suggestions ? @AirLoren, @samuelrince, @benjaminlebigot ?

@ThibaultPirson should send us some interesting data.

Implementation

We should add an error_margin_% attribute in percentage

"pe": {
  "manufacture": 24000,
  "maufacture_error_margin_%": 100,
  "use": 29800,
  "use_error_margin_%": 100,
  "unit": "MJ"
}
{
"value": 10,
"source": "lorem ipsum",
"unit": "mm2",
"error_margin": 100
}
da-ekchajzer commented 1 year ago

After exchanging with @ggael and @EtienneLeesPerasso, here are the group's advances on error margins.

Notes (in French) : https://wiki.boavizta.org/share/edc299e1-9d5c-4045-942d-29c8a1243883

Types of error margin

On input data

In our case, the input data are the technical characteristics and the usage assumptions. The uncertainty depends on the strategy used:

On datasets (secondary data)

Rarely taken into account

In the case of secondary data (impact factor) the margins of error are generally given. However, the impact factors we use are not reported with uncertainties

On impact criteria

Rarely taken into account The impact criteria also carry margins of error from standardization. In the case of GWP, for example, the IPCC produces confidence intervals for the normalization of greenhouse gases.

Implementations

Here are several possible implementations. Note that these possibilities can be combined.

Extreme cases (Min & Max)

The first possibility is to propose maximum and minimum values for each of the data and to propagate them. We can then provide the user with min & max values for all technical characteristics and impacts.

Pros :

Cons :

Probabilistic law

Another possibility is to propose for each value a probability law based on a representative distribution. Error propagation requires establishing correlations between values or assuming that there are no correlations. We can then obtain a confidence interval at X% (typically 95%). X can be given by the user.

This is what was done by @ggael for ecodiag with log normal laws based on manufacturers' data.

Pros :

Cons :

Bypass

We could infer normal probability laws based on the min and max bounds, assuming a standard deviation.

Notes that for Energizta (github.com/boavizta/energizta/) the constitution of a collaborative data set will allow by construction the creation of probabilistic distribution for electrical consumption models.

Monte Carlo

Seems too heavy for our purposes

Pedigree matrix

Pedigree matrices are used in LCA to determine a confidence interval based on a qualitative analysis of the data. They are often available for secondary data.

In the case where it is not available (which is our case), some criteria are difficult to determine without access to the LCA details.

It could be interesting to use this method for impact factors from secondary sources, with the risk of having to assume certain criteria.

A python implementation of the Pedigree matrix : https://github.com/brightway-lca/pedigree_matrix

As you can see, we have many possibilities. I have my own opinion on the matter, but I'd like to hear your opinions.

da-ekchajzer commented 1 year ago

A first implementation has been made for the upcoming (#v0.3). The first approach uses a min/max implementation that take into account the min & max values of the archetype (i.e the category of the device/components) when an attribute is completed. The error margin relative to the impacts factor are not taken into account for now. I will document this new behaviour in the documentation.