Add error margin to the impacts evaluation

da-ekchajzer commented 1 year ago

Problem

Our impact modeling is subject to very large margins of error on two types of values:

The technical characteristics
The impact factors

To account for the difficulty and imprecision of the assessments and to provide transparency to users, we should report margins of error on the returned impacts.

Evaluate error margin

How should we evaluate both error margin from technical characteristics and impacts factors ?

A first approach could be to hard code error margin
We should modify the error margin depending on the level of completion of user input. (The more default values are used, the more the error margin is important)
We should look for impacts factors from different sources to identify range.

We should think more on how to implement this feature. Any suggestions ? @AirLoren, @samuelrince, @benjaminlebigot ?

@ThibaultPirson should send us some interesting data.

Implementation

We should add an error_margin_% attribute in percentage

For each impacts

"pe": {
  "manufacture": 24000,
  "maufacture_error_margin_%": 100,
  "use": 29800,
  "use_error_margin_%": 100,
  "unit": "MJ"
}

For each boattribute

{
"value": 10,
"source": "lorem ipsum",
"unit": "mm2",
"error_margin": 100
}

da-ekchajzer commented 1 year ago

After exchanging with @ggael and @EtienneLeesPerasso, here are the group's advances on error margins.

Notes (in French) : https://wiki.boavizta.org/share/edc299e1-9d5c-4045-942d-29c8a1243883

Types of error margin

On input data

In our case, the input data are the technical characteristics and the usage assumptions. The uncertainty depends on the strategy used:

In the case of default data, a high uncertainty
In the case of completed data, the uncertainty depends on the completion method
In the case of data entered by the user, the uncertainty depends on the user

On datasets (secondary data)

Rarely taken into account

In the case of secondary data (impact factor) the margins of error are generally given. However, the impact factors we use are not reported with uncertainties

On impact criteria

Rarely taken into account The impact criteria also carry margins of error from standardization. In the case of GWP, for example, the IPCC produces confidence intervals for the normalization of greenhouse gases.

Implementations

Here are several possible implementations. Note that these possibilities can be combined.

Extreme cases (Min & Max)

The first possibility is to propose maximum and minimum values for each of the data and to propagate them. We can then provide the user with min & max values for all technical characteristics and impacts.

In the case of default data: use the max and min values in our data repository
In the case of completed data : To be determined according to the completion method
In the case of data entered by users: The user should be able to choose the uncertainty rate. We should choose one by default (0% ?).
In the case of impact factors:
- (1) Put max and min values with a default uncertainty rate
- (2) In the case where several impact factors exist: use the mix and max values in the literature
- (3) do not put uncertainty on the impact factors

Pros :

Easy to implement
Easy to explain
Easy to maintain

Cons :

Will certainly lead to very large margins of error, because we aggregate extreme cases with improbable combinations (e.g. very large die surface with very fine engraving)

Probabilistic law

Another possibility is to propose for each value a probability law based on a representative distribution. Error propagation requires establishing correlations between values or assuming that there are no correlations. We can then obtain a confidence interval at X% (typically 95%). X can be given by the user.

This is what was done by @ggael for ecodiag with log normal laws based on manufacturers' data.

Pros :

Solves the problem of extreme cases
More accurate

Cons :

Requires knowledge of the distribution of each of the data used in the calculation ⇉ This seems difficult for most data and impossible for some.
Difficult to implement, explain and maintain

Bypass

We could infer normal probability laws based on the min and max bounds, assuming a standard deviation.

Notes that for Energizta (github.com/boavizta/energizta/) the constitution of a collaborative data set will allow by construction the creation of probabilistic distribution for electrical consumption models.

Monte Carlo

Seems too heavy for our purposes

Pedigree matrix

Pedigree matrices are used in LCA to determine a confidence interval based on a qualitative analysis of the data. They are often available for secondary data.

In the case where it is not available (which is our case), some criteria are difficult to determine without access to the LCA details.

It could be interesting to use this method for impact factors from secondary sources, with the risk of having to assume certain criteria.

A python implementation of the Pedigree matrix : https://github.com/brightway-lca/pedigree_matrix

As you can see, we have many possibilities. I have my own opinion on the matter, but I'd like to hear your opinions.

da-ekchajzer commented 1 year ago

A first implementation has been made for the upcoming (#v0.3). The first approach uses a min/max implementation that take into account the min & max values of the archetype (i.e the category of the device/components) when an attribute is completed. The error margin relative to the impacts factor are not taken into account for now. I will document this new behaviour in the documentation.

Boavizta / boaviztapi