brightway-lca / brightway2-calc

The calculation engine for the Brightway2 life cycle assessment framework.
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

`MultiLCA` normalizes and weights in all combinations #108

Closed pjamesjoyce closed 1 month ago

pjamesjoyce commented 1 month ago

This might be a misunderstanding at my end, but when I use normalisation and weighting in MultiLCA all impact categories get normalised by all normalisations (and weighted by all weightings), not just the ones specified in the method_config. This proliferates the number of calculations, seemingly unnecessarily i.e. for 10 items and (for example) the 16 EF3.1 methods, once normalised and weighted you end up with 10 * 16 * 16 * 16 = 40960 scores, of which only 160 are relevant (i.e. the 10 * 16 normalised and weighted scores).

There's a full example in this gist : https://gist.github.com/pjamesjoyce/950911d4fde7f15fa6b851fe0b5cb8ad

But to summarise, I've created normalisations and weighting for the EF3.1 impact categories (using the JRC factors) and then added either 'normalisation' or 'weighting' as an extra tuple element for each impact category e.g. the normalisation for

('EF v3.1', 'climate change', 'global warming potential (GWP100)')

is

('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'normalisation')

When setting up the MultiLCA, for 'normalizations' in the method_config I have a dictionary with 16 keys, one for each of the EF3.1 normalisation categories, which each correspond to a list with one item in, which is the corresponding EF3.1 impact category. i.e.:

normalizations = {EF31_normalisation[i]:[EF31[i]] for i in range(len(EF31))}

so it looks like this:

{
    ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'normalisation'): [
        ('EF v3.1',  'acidification',  'accumulated exceedance (AE)')
    ],
    ('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'normalisation'): [
        ('EF v3.1',  'climate change',  'global warming potential (GWP100)')
    ],
... etc.

Similarly, the weightings are set up so that each weighting refers to one normalisation category:

weightings = {EF31_weighting[i]:[EF31_normalisation[i]] for i in range(len(EF31))}

i.e.

{
    ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'weighting'): [
        ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'normalisation')
    ],
    ('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'weighting'): [
        ('EF v3.1',  'climate change',  'global warming potential (GWP100)',  'normalisation')
    ],
... etc.

Given this setup, I'd expected MultiLCA.normalize() to only apply the acidification normalisation factors to the acidification impact category and so on, but it appears that it applies all normalisations (and weightings) to all impact categories.

So for 10 random processes:


demands = {}
for i in range(10):
    a = bd.Database('ecoinvent-3.9.1-cutoff').random()
    demands[a['name']] = {a.id:1}

method_config = {
    'impact_categories': EF31,
    'normalizations': normalizations,
    'weightings': weightings
}

assert bc.method_config.MethodConfig(**method_config)

data_objs = bd.get_multilca_data_objs(functional_units=demands, method_config=method_config)

mlca = bc.MultiLCA(
    demands=demands,
    method_config=method_config,
    data_objs=data_objs
)
mlca.lci()
mlca.lcia()

len(mlca.scores)
# 160

mlca.normalize()

len(mlca.scores)
# 2560

mlca.weight()

len(mlca.scores)
# 40960

to_df = []
for (w, n, i, a), s in mlca.scores.items():
    to_df.append(dict(
        activity = a,
        impact = i[1],
        normalisation = n[1],
        weighting = w[1],
        score = s
    ))

df = pd.DataFrame(to_df)
df.head(20)
activity impact normalisation weighting score
0 market for land use change, annual crop acidification acidification acidification 0.0196977
1 treatment of scrap lead acid battery, remelting acidification acidification acidification 1.89657e-05
2 natural gas, burned in solid oxide fuel cell 125kWe, future acidification acidification acidification 4.24188e-07
3 blast furnace production acidification acidification acidification 715.185
4 maleic anhydride production by catalytic oxidation of benzene acidification acidification acidification 2.19544e-07
5 natural gas, burned in gas turbine acidification acidification acidification 1.23649e-07
6 regenerative thermal oxidation of nitrous oxide acidification acidification acidification 1.70469e-06
7 sulfate pulp production, from softwood, bleached acidification acidification acidification 3.91914e-06
8 strawberry production, in heated greenhouse acidification acidification acidification 1.52614e-05
9 soybean seed production, organic, for sowing acidification acidification acidification 8.56976e-06
10 market for land use change, annual crop climate change acidification acidification 0
11 treatment of scrap lead acid battery, remelting climate change acidification acidification 0
12 natural gas, burned in solid oxide fuel cell 125kWe, future climate change acidification acidification 0
13 blast furnace production climate change acidification acidification 0
14 maleic anhydride production by catalytic oxidation of benzene climate change acidification acidification 0
15 natural gas, burned in gas turbine climate change acidification acidification 0
16 regenerative thermal oxidation of nitrous oxide climate change acidification acidification 0
17 sulfate pulp production, from softwood, bleached climate change acidification acidification 0
18 strawberry production, in heated greenhouse climate change acidification acidification 0
19 soybean seed production, organic, for sowing climate change acidification acidification 0

i.e. rows 0 to 9 give the normalised and weighted acidification scores for each of the items, but in rows 10 to 19 the climate change impact is being normalised by the acidification factors, hence the result is zero (and the mcla.scores dict consists of 40,960 items).

Am I setting it up incorrectly? or is it more complex than it's worth to apply normalisation and weighing specifically and the best bet is to filter out the non-useless results in post-processing?

pjamesjoyce commented 1 month ago

Quick update:

The following 2 ways of setting up normalisations both give the same result as the 'correct' way:

normalizations = {EF31_normalisation[i]:[EF31[-(i+1)]] for i in range(len(EF31))}
normalizations = {}
normalizations[EF31_normalisation[0]] = EF31
for i in range(1, len(EF31)):
    normalizations[EF31_normalisation[i]] = []

I'm afraid I don't understand the matrices well enough to propose a fix...

cmutel commented 1 month ago

Normally normalization is the total amount of a substance emitted per year per person (or similar), so it's a bit strange for me to think of a normalization for each impact category. That being said, this does feel like there is a bug, and certainly an opportunity for better documentation. I am looking into it.

cmutel commented 1 month ago

Sorry @pjamesjoyce, I had poor tests which only used one normalization and one weighting. Got too excited about defining a custom __matmul__ to think about what it mean to do combinatorial multiplication. Should be fixed now.

pjamesjoyce commented 1 month ago

Perfect! Thank you @cmutel - works like a charm :)