`MultiLCA` normalizes and weights in all combinations

pjamesjoyce commented 1 month ago

This might be a misunderstanding at my end, but when I use normalisation and weighting in MultiLCA all impact categories get normalised by all normalisations (and weighted by all weightings), not just the ones specified in the method_config. This proliferates the number of calculations, seemingly unnecessarily i.e. for 10 items and (for example) the 16 EF3.1 methods, once normalised and weighted you end up with 10 * 16 * 16 * 16 = 40960 scores, of which only 160 are relevant (i.e. the 10 * 16 normalised and weighted scores).

There's a full example in this gist : https://gist.github.com/pjamesjoyce/950911d4fde7f15fa6b851fe0b5cb8ad

But to summarise, I've created normalisations and weighting for the EF3.1 impact categories (using the JRC factors) and then added either 'normalisation' or 'weighting' as an extra tuple element for each impact category e.g. the normalisation for

('EF v3.1', 'climate change', 'global warming potential (GWP100)')

is

('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'normalisation')

When setting up the MultiLCA, for 'normalizations' in the method_config I have a dictionary with 16 keys, one for each of the EF3.1 normalisation categories, which each correspond to a list with one item in, which is the corresponding EF3.1 impact category. i.e.:

normalizations = {EF31_normalisation[i]:[EF31[i]] for i in range(len(EF31))}

so it looks like this:

{
    ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'normalisation'): [
        ('EF v3.1',  'acidification',  'accumulated exceedance (AE)')
    ],
    ('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'normalisation'): [
        ('EF v3.1',  'climate change',  'global warming potential (GWP100)')
    ],
... etc.

Similarly, the weightings are set up so that each weighting refers to one normalisation category:

weightings = {EF31_weighting[i]:[EF31_normalisation[i]] for i in range(len(EF31))}

i.e.

{
    ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'weighting'): [
        ('EF v3.1',  'acidification',  'accumulated exceedance (AE)',  'normalisation')
    ],
    ('EF v3.1', 'climate change', 'global warming potential (GWP100)', 'weighting'): [
        ('EF v3.1',  'climate change',  'global warming potential (GWP100)',  'normalisation')
    ],
... etc.

Given this setup, I'd expected MultiLCA.normalize() to only apply the acidification normalisation factors to the acidification impact category and so on, but it appears that it applies all normalisations (and weightings) to all impact categories.

So for 10 random processes:


demands = {}
for i in range(10):
    a = bd.Database('ecoinvent-3.9.1-cutoff').random()
    demands[a['name']] = {a.id:1}

method_config = {
    'impact_categories': EF31,
    'normalizations': normalizations,
    'weightings': weightings
}

assert bc.method_config.MethodConfig(**method_config)

data_objs = bd.get_multilca_data_objs(functional_units=demands, method_config=method_config)

mlca = bc.MultiLCA(
    demands=demands,
    method_config=method_config,
    data_objs=data_objs
)
mlca.lci()
mlca.lcia()

len(mlca.scores)
# 160

mlca.normalize()

len(mlca.scores)
# 2560

mlca.weight()

len(mlca.scores)
# 40960

to_df = []
for (w, n, i, a), s in mlca.scores.items():
    to_df.append(dict(
        activity = a,
        impact = i[1],
        normalisation = n[1],
        weighting = w[1],
        score = s
    ))

df = pd.DataFrame(to_df)
df.head(20)

	activity	impact	normalisation	weighting	score
0	market for land use change, annual crop	acidification	acidification	acidification	0.0196977
1	treatment of scrap lead acid battery, remelting	acidification	acidification	acidification	1.89657e-05
2	natural gas, burned in solid oxide fuel cell 125kWe, future	acidification	acidification	acidification	4.24188e-07
3	blast furnace production	acidification	acidification	acidification	715.185
4	maleic anhydride production by catalytic oxidation of benzene	acidification	acidification	acidification	2.19544e-07
5	natural gas, burned in gas turbine	acidification	acidification	acidification	1.23649e-07
6	regenerative thermal oxidation of nitrous oxide	acidification	acidification	acidification	1.70469e-06
7	sulfate pulp production, from softwood, bleached	acidification	acidification	acidification	3.91914e-06
8	strawberry production, in heated greenhouse	acidification	acidification	acidification	1.52614e-05
9	soybean seed production, organic, for sowing	acidification	acidification	acidification	8.56976e-06
10	market for land use change, annual crop	climate change	acidification	acidification	0
11	treatment of scrap lead acid battery, remelting	climate change	acidification	acidification	0
12	natural gas, burned in solid oxide fuel cell 125kWe, future	climate change	acidification	acidification	0
13	blast furnace production	climate change	acidification	acidification	0
14	maleic anhydride production by catalytic oxidation of benzene	climate change	acidification	acidification	0
15	natural gas, burned in gas turbine	climate change	acidification	acidification	0
16	regenerative thermal oxidation of nitrous oxide	climate change	acidification	acidification	0
17	sulfate pulp production, from softwood, bleached	climate change	acidification	acidification	0
18	strawberry production, in heated greenhouse	climate change	acidification	acidification	0
19	soybean seed production, organic, for sowing	climate change	acidification	acidification	0

i.e. rows 0 to 9 give the normalised and weighted acidification scores for each of the items, but in rows 10 to 19 the climate change impact is being normalised by the acidification factors, hence the result is zero (and the mcla.scores dict consists of 40,960 items).

Am I setting it up incorrectly? or is it more complex than it's worth to apply normalisation and weighing specifically and the best bet is to filter out the non-useless results in post-processing?

pjamesjoyce commented 1 month ago

Quick update:

The following 2 ways of setting up normalisations both give the same result as the 'correct' way:

Deliberately assigning the wrong impact category to the normalisation (by going backwards through the list):

normalizations = {EF31_normalisation[i]:[EF31[-(i+1)]] for i in range(len(EF31))}

Assigning all impact categories to the first normalisation then giving the other normalisations empty lists:

normalizations = {}
normalizations[EF31_normalisation[0]] = EF31
for i in range(1, len(EF31)):
    normalizations[EF31_normalisation[i]] = []

I'm afraid I don't understand the matrices well enough to propose a fix...

cmutel commented 1 month ago

Normally normalization is the total amount of a substance emitted per year per person (or similar), so it's a bit strange for me to think of a normalization for each impact category. That being said, this does feel like there is a bug, and certainly an opportunity for better documentation. I am looking into it.

cmutel commented 1 month ago

Sorry @pjamesjoyce, I had poor tests which only used one normalization and one weighting. Got too excited about defining a custom __matmul__ to think about what it mean to do combinatorial multiplication. Should be fixed now.

pjamesjoyce commented 1 month ago

Perfect! Thank you @cmutel - works like a charm :)

brightway-lca / brightway2-calc

`MultiLCA` normalizes and weights in all combinations #108