GreenDelta / olca-schema

Other
14 stars 7 forks source link

Streamline process serialization by omitting default-valued causal allocation factors #4

Open bkuczenski opened 5 years ago

bkuczenski commented 5 years ago

When a multioutput process is allocated in OLCA, the app automatically creates economic, physical, and causal allocation factors, but most of them are zero. When the process gets serialized, the zero allocation factors are serialized as well.

Supposing that an absent allocation factor is implicitly a zero allocation factor, these don't add any information. Why not just omit the zeros? This gist shows a database with about 30 processes, of which 3 are allocated- half a megabyte of JSON is used to store the allocation factors (20 kb zipped)

https://gist.github.com/bkuczenski/2090074daca2f6ea78619a593844c940

I use the following line whenever I open OLCA process data sets:

if 'allocationFactors' in p_j:
    p_j['allocationFactors'] = [v for v in p_j['allocationFactors'] if v['value'] != 0]
msrocka commented 5 years ago

The examples are maybe a bit atypical (lot of elementary flows + many output products and everything is fully allocated to only one product); as the data set documentation states something different, I think this is even not correct:

image

It is true, that openLCA explicitly stores zero value allocation factors but only for such multi-output processes. In my opinion being explicit in this case is better as regarding the calculation 0 is not the default (neutral element; which is 1). Also true that openLCA could infer the 0 factor for the nth product of a process with n products and a sum of 1 for the other n-1 factors. But in the same way, it could also infer a factor of 0.3 when the n-1 factors sum up to 0.7.

bkuczenski commented 5 years ago

The examples are not as you describe- the nonzero allocation factors are a full complement of factors for each of physical and economic allocation, and then hundreds of zero-valued causal allocation factors (though they do show up as 1.0 in the app, as you say. In my observation they are serialized as 0).

I guess I do not understand how causal allocation works-- if 1 is the neutral element, then they must be computed differently from the other allocation modes, or else the allocated processes will not sum to one (do they get normalized?)

see also my other picky question about how allocation factors are applied in the LCI computation

Anyway, a modified point still stands: I feel like entering an allocation by property should not lead to the superfluous generation and serialization of hundreds of default values for causal allocation that the user never used. Screenshot from 2019-05-02 12-16-46

msrocka commented 5 years ago

uff… this is strange: when the allocation factors are displayed as ones they should not get exported as zeros. I checked the petroleum data set on the LCI commons and there are all these zero value allocation factors. When I import the data set into openLCA they show up as zero factors. Could you maybe share a dump of your database (right click in the navigation > backup database)?

Regarding how the causal allocation works: Say we have a process with m allocatable inputs or outputs (elementary flows, product inputs, waste outputs) and n output products. For the economic and physical allocation we have an allocation factor for each of the n products respectively. This means that for a product j the same allocation factor f_j is applied to all the m inputs and outputs (after formula evaluation, unit conversion, number generation etc.) to create the mono-functional process vector for j, with

\sum_j^n f_j = 1

These factors can be related to a flow property (like in EcoSpold 2) but in openLCA the user can also enter property-unrelated factors if the user knows better (this is why we have the calculation helper but you can always set your own factors).

The causal allocation is different (~and typically never property related~): for each i of the m inputs and outputs an individual allocation factor f_ij can be applied (e.g. if you know that a specific emission i is related to a fraction f_ij to the product j and this fraction is different for another emission etc.). So you have a m*n matrix F in this case with

\sum_j^n f_{ij} = 1

Thus, for all three methods we would store

2 * n + m * n

allocation factors. In the petroleum example we have m = 41 and n = 10 and with this 430 allocation factors (as in your script). But there must be at least 2 + m (=43) non-zero factors when everything is allocated to a single product. When I import the example data set in openLCA this is exactly the case (but not in your example).

The example queries in openLCA:

-- find the process ID -> 80442
select id from tbl_processes where ref_id = '0aaf1e13-5d80-37f9-b7bb-81a6b8965c71';

-- count all exchanges -> 51; with 10 products
select count(*) from tbl_exchanges where f_owner = 80442;

-- count all allocation factors -> 430
select count(*) from tbl_allocation_factors where f_process = 80442;

-- count the non-zero factors -> 43
select count(*) from tbl_allocation_factors where f_process = 80442 and value <> 0.0;