OasisLMF / ktools

In-memory simulation kernel for loss modelling.
BSD 3-Clause "New" or "Revised" License
28 stars 19 forks source link

The `vulnerability.bin` file can be written with the wrong data types #361

Closed mtazzari closed 10 months ago

mtazzari commented 1 year ago

Issue Description

If the vulnerability function table contains vulnerability ids that are bigger than the maximum value that can be represented by the vulnerability id data type used to write the binary file (currently, unsigned int 32), getmodel (and modelpy) have no way to detect that the vulnerability ids have been affected by overflow. Examples: 1) if vulnerability_id = 5,000,000,00 is used in the vulnerability table, when it is read back from the binary file by getmodel (and modelpy) it is read as 5,000,000,000 % 4,294,967,295 = 705032705 2) if vulnerability_id = 10,000,000,00 is used in the vulnerability table, when it is read back from the binary file by getmodel (and modelpy) it is read as 10,000,000,000 % 4,294,967,295 = 1410065410

Steps to Reproduce (Bugs only)

  1. change the vulnerability file to have very big vulnerability id, larger than the unsigned int32 max value (4,294,967,295).
  2. run eve 1 1 | modelpy | cdftocsv and inspect the results

Version / Environment information

Linux Affecting all versions of getmodel and modelpy

Example data / logs

Suggestions for solving this

A good approach would be: to create a Python version of vulnerabilitytobin that uses a Python API that can also be used directly from technical users, along the lines of:

def write_vulnerability_to_bin(df=None,vulnerability_id=None, intensity_bin=None, damage_bin=None,probability=None):
    if df is not None:
        # assume the df contains all the columns
   else:
        # assume all the other columns are passed in input as numpy arrays

   # write to binary, doing the type check that the right types are used for the input data.
from oasis.pytools.api import write_vulnerability_bin
hchagani-oasislmf commented 10 months ago

Following discussion during today's developer meeting, allowing one to use a vulnerability ID that exceeds the maximum value for a signed 4-byte integer may encourage the use of descriptive vulnerability IDs. To reduce memory use, it would be better to stick with a 4-byte integer, which with a maximum value of 2,147,483,647 should be more than large enough to contain all vulnerability IDs.

A check for this should be introduced to the validation component validatevulnerabililty. These validation checks will be incorporated into the conversion tools by default at a later date (see issue https://github.com/OasisLMF/ktools/issues/356 for details).