Further improvement of the supplychain module

The module provides an excellent basis for indirect impact computation, but there are several aspects in the code structure that could largely benefit from a rework.

Most methods do too many things at once, more granularity would notably ease unit tests
The methods calc_shock_to_sectors and calc_impact can quickly confuse the user. Notably, as they both accept exposure and impact as arguments, thus the entry point for exposure and impact is unclear.
In particular, there is a concerning block in calc_impact where exposure and impact arguments are ignored if a shock was already computed. (note that #81 will add a warning)
Multiple attributes of a SupplyChain object are initially Noneand are set by methods thereafter. This can create confusion on when attributes are correctly computed, and may lead to incoherent objects.

I propose to improve the implementation with the following (and could work on that once #81 is merged, or later, this is mostly to track ideas). I see at least two ways to do this:

Keep a single SupplyChain class, and improve its initialisation and the distinction between calc_shock_to_sectors and calc_impact. The core idea would be that instantiating a SupplyChain object would by default require both MRIOT, Exposure and Impact, and create an object which holds direct impact translated into MRIOT format, from which indirect impact can then be computed.

Pros:

Not that much work
Keeps things mostly similar Cons:
Might still lack clarity
SupplyChain would still hold empty attributes for indirect impact results when the computation was not run

Separate things more by creating two new classes, say IO_Shock and IndirectImpact (Other suggestions welcome!), such that, IO_Shock does the translation of a CLIMADA impact into a MRIOT formatted one, and IndirectImpact does the computation.

Pros:

Probably clearer
Segments things better
Would probably be easier to extend in the future Cons:
Will take more time and work
Will break probably all current usage of the module

05/06/24: Approach chosen is 2.

Other sub-issues that should be addressed:

for the boario approach, the simulation length is currently last_event_date + 365, the 365 should not be hardcoded
the io_model variable in the calc_matrices() method should probably be a class attribute?

Features that could be added / or should be thought of while doing the rework:

Being able to generate IO_Shocks from other inputs like summary statistics instead of Impact objects.
Integration of the Inoperability model
Integration of the MRIA model
Feedback loop between indirect shocks and direct shocks for multiple event case (better account for the fact that what is destroyed cannot be destroyed again)

Features that could be added / or should be thought of while doing the rework:

Integration of the Inoperability model Integration of the MRIA model Feedback loop between indirect shocks and direct shocks for multiple event case (better account for the fact that what is destroyed cannot be destroyed again)

Thanks a lot @spjuhel - this is quite alinged with what I had in mind!

I like these ideas, and having a granular, pythonic refactor would be lovely.

I don't know how many users depend on this yet: as a fairly recent creation it's hopefully not many. I'm definitely happy to work with breaking changes and refactor my code
I'm in favour of renaming things and splitting functionality. (But not DirectImpact/IndirectImpact: it's confusing when a class with Impact in the name should inherit from the Impact class). Maybe DirectShock and IndirectShock. (See also my next comment/suggestion/request which would love this split).

I also have an additional functionality request! Maybe this should be a separate Issue. But I'll start here for the purpose of discussion since it seems at least a little relevant.

One particular bit of granularity that I'd like is flexibility with the calc_shock_to_sectors method. Currently It takes an Impact and an Exposures object as input, aggregates them by region_id, and translates that as a direct shock to the MRIOT tables based on the requested sectors. The flexibility I need is to work with larger numbers of Impacts and Exposures. Namely:

shocking different sectors with different impacts. Currently the method calculates a single % shock for each country and applies it to all selected MRIOT sectors in that country, whereas I need to shock, e.g. agriculture and manufacturing sectors differently
the ability to provide an Impact-Exposures pair for each region_id, rather than needing them all to be combined into single huge objects. For the global analyses I'm running these combined objects are wayyy too large
ideally, not requiring the Impact object to have an imp_mat. For large event sets and many sector-country combinations the matrices can be huge and I'd prefer not to store them. The calculation only needs the total impact by event and country, it doesn't care about individual exposures. My suggestions below (crudely) work around this.

My proposed solution here is to separate calc_shock_to_sectors functionality into two bits of functionality: namely calculating shocks and combining shocks, letting you aggregate across multiple Impacts/Exposures/sectors. I don't know the best way to implement them because the order can be flexible: aggregating can be done before or after the main part of calc_shock_to_sectors. That is,

either: (1) aggregate_impacts_to_shock, which takes an Impact-Exposure pair and returns a data frame with the % loss for each event in each region_id. Then (2) calc_shock_to_sectors which takes a list of these data frames and calculates the input shocks for the MRIOT calculations from them, creating the final SupplyChain.secs_shock object. Ideally, the methods also have a way to encode which sectors are shocked by each dataframe so we can shock different sectors differently
or: (1) keep calc_shock_to_sectors as it is, and then (2) provide a combine_shocks method that takes multiple SupplyChain.secs_shock objects and sums them, allowing you to build the final shock from one country-sector at a time. Note: if the suggested DirectShock and IndirectShock classes are created, then this can be implemented as a DirectShock.combine(data: List[DirectShock]) class method.

Both of these solutions allow me to work with large numbers of Impact objects and Exposures more elegantly. To deal with the impact matrix question, I would also add a check in calc_shock_to_sectors that spots when an Impact is only for a single country/region_id and then uses the Impact.at_event property instead of the marginal sums of the Impact.imp_mat, since these are equivalent in this case.

For now I have a local workaround with the second method. I'd be happy to turn it into a pull request!

CLIMADA-project / climada_petals

Further improvement of the supplychain module #124

05/06/24: Approach chosen is 2.