Possibility to obtain and work with the inverted technology matrix (A-1)

bsteubing commented 3 years ago

Currently, bw-calc only calculates the solution to the inventory problem, i.e. s in As = f. This is sufficient to get LCA results and do contribution analysis. But it is not sufficient to generate Sankey diagrams that show how impacts build up across the life cycle. The work-around that has been used, e.g. in the Activity Browser, is to use the brightway graph-traversal. This works, but it is not ideal for generating the Sankey diagrams as one graph traversal has to be done for each reference flow and impact category. Having the matrix inverse (A-1) would cost some up-front calculation time, but then make Sankey diagram calculations orders of magnitude quicker (e.g. like in SimaPro). To my knowledge, one of the reasons why calculations in OpenLCA and SimaPro take longer is because the matrix inverses are calculated. OpenLCA has found an elegant solution, which could be copied in bw. You can chose how you want to calculate the LCA results (there is a "quick" way (without matrix inverse, not giving you Sankeys) and an "analysis" way (calculating the inverse and providing Sankey results)).

So what I am practically suggesting is to include a method (e.g. as in bw.LCA(... ,... , calculate_matrix_inverse=False)) that includes a matrix inversion. Default should be "False" to get quick LCA results, but the option "True" should do it (or at least provide) the inverted A matrix.

By the way, this can have other benefits as well:

Sankey diagrams
once you have the matrix inverse you can calculate LCA results for all processes in your database MUCH quicker

Sorry, I had limited time to write this up, but it would be great if bw could also return the inverse technology matrix (and possibly also do things with it)

cmutel commented 3 years ago

Sorry about not responding sooner - I used to get emails, but now need to check manually to see if there are new issues. Still, my fault.

This is a reasonable request. I did some initial testing, and computing the inverse manually by iterating through each possible activity (and using pardiso) took about 5 minutes on my machine. Converting to a dense matrix and using np.lingalg.pinv took over 50. There are some other possible approaches in this SO question: pseudo inverse of sparse matrix in python.

My conclusion from these times is that the inverse is still not all that beneficial. Instead, we need to improve the performance of the graph traversal. One thing that immediately sticks out is that method cumulative_score doesn't "remember" results it has already calculated. For very linear supply chains, this doesn't matter, but for other cases this could be significant.

We could also add the possibility to cache calculation results in the LCA class itself, so that calling redo_lci would use the cached result. This isn't trivial, as we need to make sure that the factorized technosphere matrix hasn't changed (see e.g. https://github.com/haasad/PyPardisoProject/issues/1). But it is possible, if we add some cache control.

bsteubing commented 3 years ago

Wow, these are really long calculation times. I would still assume that there is a way to calculate the inverse quicker... but I also haven't looked into that. To your points:

remembering results in the graph traversal may be an elegant solution to speed up calculations (if technically possible without hickups)
caching LCA results and managing that cache seems indeed non-trivial (e.g. also thinking about multi-functionality and all the advanced possibilities with bw2.5, you'd kind of have to link the cache to a flag if any data has changed anywhere, but perhaps that is the same for the graph traversal?)

Personally, I would probably first check out other possibilities of "inverting" than adding caches and only if we are sure that we can't do it quicker than 5 min go for a caching solution (but the fact that OpenLCA or Simapro are quicker kind of let's me think that there must be faster ways...).

haasad commented 3 years ago

This is a reasonable request. I did some initial testing, and computing the inverse manually by iterating through each possible activity (and using pardiso) took about 5 minutes on my machine. Converting to a dense matrix and using np.lingalg.pinv took over 50. There are some other possible approaches in this SO question: pseudo inverse of sparse matrix in python.

Instead of iterating you can also pass everything in one go to pypardiso:

A_inv = pypardiso.spsolve(lca.technosphere_matrix, np.eye(*lca.technosphere_matrix.shape))

Takes ~90 seconds on my machine (albeit with cutoff 3.5, I assume the latest ecoinvent matrix is a bit bigger). The resulting matrix A_inv takes up 2Gb of memory:

A_inv.nbytes*1e-9
2.053635872

Maybe there is some elegant way of passing the relevant activities to pypardiso in large batches in the graph traversal? Some middle ground between pre-calculating every single activity and doing it iteratively in the graph traversal. I'd imagine there's a lot of activities that never contribute significantly, so solving for the full inverse seems a bit overkill.

haasad commented 3 years ago

By the way, this can have other benefits as well:

once you have the matrix inverse you can calculate LCA results for all processes in your database MUCH quicker

I'd argue that working with a 2Gb matrix in memory will be much SLOWER than using a sparse solver for the majority of use-cases.

Very crude comparison for the lci:

demand = {bw.Database('cutoff35').random(): 1}
%time lca.redo_lci(demand)
print(lca.supply_array.sum())
CPU times: user 44.2 ms, sys: 4 ms, total: 48.2 ms
Wall time: 12.9 ms
1.7167464502649536

%time supply = A_inv * lca.demand_array
print(supply.sum())
CPU times: user 203 ms, sys: 180 ms, total: 383 ms
Wall time: 380 ms
1.7167464502649434

Maybe I'm missing something on the lcia side, but unless you need additional details like in the sankey case, calculating the inverse is not worth it.

Edit:

%time supply = A_inv[:, lca.demand_array.nonzero()] * lca.demand_array[lca.demand_array.nonzero()]
CPU times: user 2.55 ms, sys: 16.1 ms, total: 18.7 ms
Wall time: 17.3 ms

This comparison is fairer, since multiplying with the whole matrix is not required, but still not faster.

cmutel commented 3 years ago

Thanks for the suggestion @bsteubing. @haasad, I added a warning and a link to this issue for users to get informed on whether inversion was the right approach for them.

bsteubing commented 3 years ago

@haasad and @cmutel thanks so much for adding this functionality!

What I don't understand is why A-1 should take up 2 GB?? Does a 20k by 20k square matrix... require that much memory?

Next to having this functionality for generic purposes (A-1 kind of belongs to matrix based LCA), I think the main use case could be the Sankey diagram, when people want to explore in more depth.

I of course agree with you that for most use cases by-passing the inverse is always the best solution.

cmutel commented 3 years ago

@bsteubing The inverse is a dense matrix (calculated from either the Numpy pinv or the function that Adrian mentioned). For 20k by 20k, with a dtype of float64, you get 3.2 GB of memory.

Specific numbers for ecoinvent 3.8 cutoff: A is 19565 by 19565, A inverse is too, fill rate (if it were sparse) would be 34.5%, so storing as sparse doesn't get you much. Specifically, the dense matrix is 3.06 GB, and if it were stored as sparse CSR it would be 1.59 GB, as there would be 132.318.968 non-zero elements with dtype of float64, 132.318.968 column indices of dtype int32, and 19.565 row indices of dtype int32.

haasad commented 3 years ago

@cmutel I get

n [59]: A_inv_sp.nnz
Out[59]: 251817026

In [60]: np.count_nonzero(A_inv)
Out[60]: 251817026

n [61]: A_inv_sp.nnz / np.multiply(*A_inv_sp.shape)
Out[61]: 0.6578477385302577

So 65% density instead of 35%, which makes it even worse, ie. no benefit for using a sparse matrix anymore. See numbers below:

A	A^-1 dense	A^-1 sparse
density	0.00063	1	0.6578
memory (GB)	0.00297	3.062314	3.02188

@bsteubing I'd be very careful before using that functionality in the AB. For the bigger 19.5k by 19.5k cutoff 3.8 matrix, it took my laptop around 5 minutes to calculate the inverse. I saw spikes in memory usage of up to ~11GB while calculating the inverse, so this operation has the potential to crash the AB on computers with 8GB of RAM.

And regarding your earlier statement:

(but the fact that OpenLCA or Simapro are quicker kind of let's me think that there must be faster ways...).

My understanding was always that neither Simapro nor OpenLCA actually calculate the full inverse when running. They use precalculated values (ie. system processes instead of unit processes if I remember the ecoinvent terms correctly :smiley:) Looking at the numbers I listed above, it seems very unlikely to me that they calculate the inverse every time, seeing how memory intensive that is. Especially a few years ago and without an efficient solver (unless Delphi and Java had better sparse solvers long before python)

bsteubing commented 3 years ago

Again thanks both of you for this very useful information. I haven't used SimaPro in a while, but I wonder then how they manage to have such a smooth Sankey experience (and what they use under the hood). Still I think it is great to have this functionality now that we can technically get A-1 and use it for whatever purpose in LCA research (perhaps not in the AB).

cmutel commented 3 years ago

I don't have any details on SimaPro, but they have discussed a new calculation engine on their blog. But you live close to them, maybe you ask for more info!

haasad commented 3 years ago

All results are calculated with a precision of 8 digits or better, except the cumulative indicators shown in the network diagram and the intermediate results of the uncertainty analysis (Monte Carlo) function

I'll have a look a the openLCA code later, just saw that in 2018 they started using Julia to call the UMFPACK solver under the hood. Sankey implementation is here: https://github.com/GreenDelta/olca-modules/blob/master/olca-core/src/main/java/org/openlca/core/results/Sankey.java

haasad commented 3 years ago

This is the heuristic that openLCA uses to decide how to get results: https://github.com/GreenDelta/olca-modules/blob/0a2ae67d0d736bed9001f73bdf7710bb766ab1ad/olca-core/src/main/java/org/openlca/core/results/providers/ResultProviders.java#L18-L27

if (data.hasLibraryLinks())                                                                
    return LazyLibraryProvider.of(db, data);                                               

var isSmall = data.techMatrix != null                                                      
              && data.techMatrix.rows() < 3000;                                            
if (isSmall)                                                                               
    return EagerResultProvider.create(data);                                               
return data.isSparse() && solver.hasSparseSupport()                                        
    ? LazyResultProvider.create(data)                                                      
    : EagerResultProvider.create(data);

Ie. first try to get "precalulated" results with the LazyLibraryProvider, otherwise for small matrices (<3000 rows) get results with the EagerResultsProvider (which calculates the inverse) and use a sparse solver (LazyResultProvider) for everything else (what brightway does).

So openLCA definitely doesn't caluclate an inverse for ecoinvent. Either they use precalculated LCIA results or their sankey implementation is more efficient than brightway's graph traversal. But I've read enough java code for today :see_no_evil:

cmutel commented 1 year ago

Situation as of late 2023, testing on server with 32 cores using ecoinvent 3.9:

Converting to dense and solving with numpy pinv: Takes forever. Killed after ten minutes.
Solving using scipy.sparse.linalg.inv (single-threaded): 6 - 10 minutes
Solved using approach from @haasad, using single-pass pypardiso code incorporated in bw2calc (multi-threaded): ~3 minutes

brightway-lca / brightway2-calc

Possibility to obtain and work with the inverted technology matrix (A-1) #35