matamadio commented 1 year ago

Dataset overview

Dataset produced for INDIA by RMSI consultant along previous MHRA project (now decomissioned geoportal). immagine Data were paid again to be released by S. Hallegatte (USD 15k). Time to save these long-term on RDL collection. Data cover several hazards for India.

Dataset details and structure

The shared output is made of hazard layers, each one represening a specific hazard magnitude in relation to its probability (Return Periods).

Hazards included

River flood: 100 m probabilistic model by RMSI, complemented by 1 km global model. Data as grid shapefiles for each state. Return periods: 2, 5, 10, 25, 50, 100 years. Size: 1.8 Gb.
Storm surge: two datasets from probabilistic modelling:
- 100 m country-level raster representing water depth. Size: 260 Mb.
- 1 km state-level grid shapefiles including both water depth and velocity variables. Size: 2.55 Gb. Return periods: 2, 5, 10, 25, 50, 100, 250, 500, 1000 years.
Cyclone flood: 100 m probabilistic model by RMSI, complemented by 1 km global model. Data as grid shapefiles for each state. Return periods: 2, 5, 10, 25, 50, 100, 250, 500, 1000 years. Size: 5.7 Gb. The high-resolution model (100m) is applied over urban areas:
Cyclone wind: 10 km probabilistic model by RMSI. Country-wide raster data as a merge of state-level simulations). Return periods: 2, 5, 10, 25, 50, 100, 250, 500, 1000 years. Size: 3 Mb. Shapefile data are also provided, size is 25 Gb - to be downloaded and reviewed
Tsunami: two country-level raster datasets:
- 100 m probabilistic (RP 500 years)
- deterministic (east India) Size: 50 Mb.
Wildfires: 25 m raster grid susceptibility model for two states: Jammu & Kashmir. Size: 10 Mb.
Drought: 16 Gb of data uploaded on April 17; to be downloaded

matamadio commented 1 year ago

In terms of data efficiency, we could easily reduce the size of both fluvial and cyclone flood layers by turning the polygon grids (only the high-resolution sections) into deflated raster grids of the same resolution. The polygon grids only carry one value, as such there will be no loss of information.

matamadio commented 1 year ago

Exposure data

The largest share of data - about 110 Gb split into ADM1 (state) folders by ISO_a2 code. There are 34 states folders.

Folders structure:

- ISO_a2
  - Line
  - Point
  - Polygon
  - Raster

Each folder contains different exposure components:

Line

The largest folder in terms of size. Includes shapefiles (line) representing the transport network (roads, bridges, railways and subways), transmission lines and embankments.

Point

Includes shapefiles (point) representing public places and infrastructure locations by category, such as health facilities, cyclone shelters, dams, schools, fire stations, and more.

Polygon

Includes shapefiles (polygon) representing the area of key structures and infrastructures such as power plants, sea ports, refineries, but also slums and mangroves extent.

These vector datasets have been derived from Open Street Map and national sources; for some of them, additional attributes have been added in the shapefile database, in particular the key attributes are replacement value and building features (n. of storeys, structure type, construction year) which can be used for risk analytics. Note that for many layers, these attributes are empty of values.

All these olders should be individually zipped for size efficiency (about 10% of original size).

Raster

Includes one large GeoTiff file showing agricultural area (tblagricultureexposure_ISO_a2.tif); some states also have (tbllulcexposure_ISO_a2.tif) showing land cover types (LULC). The two datasets are not aligned. The raster data are not compressed; this should be done before storage for efficiency.

Example: tblagricultureexposure_wb.tif

Original size: 400 Mb
Compressed size: 6.3 Mb

METADATA

METADATA folder: Comprises of two types of metadata, namely Hazard and Exposure. Excels and pdf file is available for all the hazards and exposures captured in IMHRA study. Additionally, a readme (MS EXCEL) is available with:

State Codes
Exposure layers covered with state wise layer name for each exposure
Detailed hazard layers present with table name and return periods covered.

matamadio commented 1 year ago

Workflow to package and store exposure data

Download data and metadata from the RMSI sharepoint.
Compress raster data in Raster folders
- Use GDAL Warp in batch mode, keep all options as default, just set "Additional command-line parameters" with: -co COMPRESS=DEFLATE -co PREDICTOR=2 -co ZLEVEL=9, and oput layer based on input layer name. See example in QGIS:
Zip shapefile folders (Line, Point, Polygon)
- Downloaded zip uses light compression: unzip and re-zip folders using stronger algo. I used 7zip for the task.
Metadata: either provide one zip with all the metadata as resources; or copy the metadata in each state folder (let's decide one or the other)
Upload the data in the DDH folder on sharepoint

This is the first step to secure data; then (at later stage, less urgent) we need to produce appropriate metadata files for each one to be included in the WB data catalog.

Create an excel spreadsheet describing the datatype with each dataset name, URL to data and URL to prepared JSON metadata.
Describe the data structure to be achieved on DDH.

DDH team will then copy the data and metadata to DDH Sharepoint; the datasets will appear in your My Datasets for review and any further editing.

Split the effort!

35 states folders: AN, AP, AR, AS, BR, CG, CN, DD&DNH, DL, GA, GH, HP, HR, JH, JK&LK, KA, KL, LD, MH, ML, MN, MP, MZ, NL, OD, PB, PY, RK, SK, TN, TR, TS, UK, UP, WB
Processed by x consultants (Mat, Bramka, Asmita, ..?)

Mat

Hazard datasets: all
Exposure datasets: AN, AP, AS, CG, DD&DNH, GA, JK&LK, KL, LD, MZ, PY, RJ, SK, WB

bramkaarga commented 1 year ago

Upload by Bramka:

Hazard datasets: Drought
Exposure datasets: AR, BR, CN, GH, HP, HR, JH, KA, MH, ML, MN, MP, MZ.

GFDRR / rdl-standard

[DATA] MHRA INDIA (RMSI data) #40

Dataset overview

Dataset details and structure

Hazards included

Exposure data

Line

Point

Polygon

Raster

METADATA

Workflow to package and store exposure data

Split the effort!