gem / oq-engine

OpenQuake Engine: a software for Seismic Hazard and Risk Analysis
https://github.com/gem/oq-engine/#openquake-engine
GNU Affero General Public License v3.0
383 stars 277 forks source link

Can we move the calculation of aggregated event loss tables in postprocessing? #5343

Closed micheles closed 4 years ago

micheles commented 4 years ago

Probably not for event based, but certainly yes for scenario_risk, and it is worth testing the feasibility of the approach.

PS: after 11 days of work, it turns out that the approach is viable even for ebrisk, if we discard the smallest losses, which affect only the low period portion of the loss curves.

micheles commented 4 years ago

The trick to make this work is to store only the highest losses per asset, i.e. the tail of the distribution instead of the full distribution.

raoanirudh commented 4 years ago

Can we consider adding pandas as a dependancy? Some of the functionality pandas provides that could be very useful for dealing with large damage tables and loss tables, from the package overview:

Pandas also provides data structures for efficiently storing sparse data.

micheles commented 4 years ago

I actually support this, independently from the event_based_damage project, it makes sense for us to have a "batteries included" OpenQuake distribution.

g-weatherill commented 4 years ago

+1 to this. Several additional toolkits building on OQ use Pandas, so there are few situations where one may use OQ in a distribution without Pandas. Any possibility there could be speed-ups in, for example, the site collection object if it worked as a Pandas dataframe?

micheles commented 4 years ago

I sincerely doubt that using Pandas will speedup the site collection. Providing Pandas will make it easier to explore the datastore (see https://github.com/gem/oq-engine/pull/5357) and to perform post-processing analysis, but for the time being I would refrain from using it in the core engine. I do not trust it as much as I trust numpy, especially performance-wise.