Aristotle Project - Githubissues

micheles commented 7 months ago

Given an USGS Shakemap ID, get the corresponding rupture, exposure, vulnerability functions and GMPEs and perform a risk calculation (see https://docs.google.com/document/d/1mS2S7yOohiJjiEqL85E65k2XbOj7xEAuUo4NaAFHuco).

Geolocation by country can be done via the files in https://www.geoboundaries.org/globalDownloads.html

The difficulty here is to collect the world exposure, world vulnerability functions/taxonomy mappings and world GMPEs from dozen of repositories and fix all the inconsistencies. Here is a list of inconsistencies:

[x] the hazard countries (https://github.com/GEMScienceTools/oq-mbtk/blob/master/openquake/ghm/mosaic.py#L130) are inconsistent with the risk countries, i.e. there are different or wrong 3-letter country codes
[x] the risk repositories have inconsistencies, for instance /home/risk/global_risk_model/North_America contains an empty directory Exposure/Exposure/Disaggregated/ differently from other regions
[x] the regions in the risk mosaic are different from the regions in the hazard mosaic
[x] /home/risk/global_risk_model/Europe/Hazard/ is empty, so I cannot get gmmLT.xml
[x] in many cases (i.e. Southeast_Asia containing IDN, PHL, SEA) there are multiple choices for gmmLT.xml
[x] the site_model.csv files of the mosaic, when collected together, have duplicated points
[x] the exposure XML files contain in the fieldmap <field oq="residents" input="OCCUPANTS_PER_ASSET" /> however in the CSV files the name is OCCUPANTS_PER_ASSET_AVERAGE.

NB: the risk regions are

Africa
Caribbean_Central_America
Central_Asia
East_Asia
Europe
Middle_East
North_America
North_Asia
Oceania
South_America
South_Asia
Southeast_Asia

USGS ruptures (like https://earthquake.usgs.gov/product/shakemap/us70006sj8/atlas/1594403794805/download/rupture.json) have the format

{
  "type": "FeatureCollection",
  "metadata": {
    "reference": "Origin",
    "id": "us70006sj8",
    "network": "USGS National Earthquake Information Center, PDE",
    "netid": "us",
    "productcode": "us70006sj8",
    "time": "2019-12-30T17:18:57.000000Z",
    "lat": 35.5909,
    "lon": 74.6280,
    "depth": 13.8,
    "mag": 5.6,
    "locstring": "34km NW of Idgah, Pakistan",
    "mech": "ALL",
    "rake": 0
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "rupture type": "rupture extent"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [ 74.6280, 35.5909, 13.8 ]
      }
    }
  ]
}

[x] The first step is to add a function to download such files and to convert into a rupture_dict
[x] The second step is to add the rupture_dict parameter to the job.ini
[x] The third step is to generate planar ruptures from rupture_dict by using the code in IPT

Sometimes the USGS also gives .json files with geometries that can be converted to OpenQuake ruptures as in this notebook: https://github.com/gem/earthquake-scenarios/blob/main/src/2_1_rupture_usgs_json_to_oq_xml.ipynb

We also need

[x] a file with famous ruptures (taken from the repo eartquake-scenarios) to be used for running tests
[x] a file with the taxonomy mapping to use for each country
[x] store the vulnerability functions for the whole world
[x] store the taxonomy mappings for the whole world

The taxonomy mapping per country can be extracted from here: https://gitlab.openquake.org/risk/global_risk_model/Scripts/-/blob/master/grm_calculations/job_files.csv

NB: the taxonomy mapping is a HUGE problem. Currently the engine cannot manage the case of two assets of the same taxonomy being mapped to different vulnerability functions because they belong to different countries. The taxonomy mapping is global, while we would need to make it country-dependent. Also, splitting the exposure in countries and perform multiple calculations is a solution only in theory, since it makes everything more complex and much slower. We will probably have to rewrite completely the risk calculators (for instance the RiskComputer assumes assets with the same taxonomy are associated to the same risk functions), which is hard :-(

nicolepaul commented 7 months ago

Michele, please see the attached CSV to help you map between the GRM repos and the hazard mosaic repos. I also included some comments on 'exceptions' to the general case.

If you clone the relevant risk region repo (e.g., global_riskmodel/Africa) and --recurse-submodules / update the submodules, you should have all the dependencies (hazard, exposure, vulnerability) on the appropriate versions. The job.ini file will list all the specific paths you need for the gmmLT, vulnerability curves, etc. The Exposure.xml file indicates whether to use the aggregated or the disaggregated exposure.

The current status of the risk repos on cole/davis is unknown, since we have not run the GRM since June and individual modellers may make changes to those files, some repos may only be partially cloned (without submodules) after some server/cluster modifications, etc.

GRM_Mosaic_Map.csv

micheles commented 6 months ago

Currently the idea is to build a few HDF5 files at each new release of the mosaic:

site_model.hdf5 for the hazard mosaic (using utils/build_global_sites)
exposure.hdf5 for the risk mosaic (using utils/build_global_exposure)

Then the Aristotle calculator will be able to extract from such files the relevant information quickly.

raoanirudh commented 1 month ago

A further crucial feature will be the inclusion of recording station data for ground motion conditioning, if such station data is already available at the time of launching an Aristotle calculation.

Given a USGS ShakeMap id, the station data curated by the USGS for the event can be found in the associated stationlist.json file, for instance https://earthquake.usgs.gov/product/shakemap/us7000m9g4/us/1715297585708/download/stationlist.json for the 2024 M7.4 Hualien earthquake earlier this year in Taiwan. INGV uses an identical format for their station data file, for instance http://shakemap.rm.ingv.it/shake4/data/8863681/current/products/stationlist.json for the 2016 M6.5 Norcia earthquake. Documentation of the stationlist.json file format is available at https://usgs.github.io/shakemap/manual4_0/ug_products.html#stationlist-geojson.

The json file would need to be parsed, checked for duplicate entries and outliers, and converted to the csv format accepted by the OpenQuake engine (or directly to the internal dataframe format used by the engine after reading the csv station data input file).

The station data file can contain two kinds of stations – 'seismic' stations and 'macroseismic' stations. Seismic stations report the ground motions recorded by instruments, whereas macroseismic stations might report intensity values inferred from observed damage patterns for historical earthquakes or inferred from felt reports for recent earthquakes. For this implementation, only the seismic stations should be considered. All available IMTs relevant for the risk calculations should be read from the station data file – typically available IMTs might include PGA, SA(0.3), SA(1.0), and SA(3.0).

A site model would also need to be generated for the station sites. If the Vs30 values at the locations of the stations are already available through the stationlist.json file, those can be used directly, otherwise the Vs30 values for the stations would need to be extracted from the global vs30 hdf5 file. If any other site parameters other than Vs30 are required by the ground motion models that will be used in the calculation, those additional site parameters will also need to be included in the station site model file.

Once we have these two new inputs (the station data file or dataframe, and the station site model), Aristotle should run the requested scenario with the conditioned_gmfs calculator.

gem / oq-engine

Aristotle Project #9227