OasisLMF / ODS_OpenExposureData

Open data standards curated by Oasis.
61 stars 8 forks source link

Number of occupants - addition to loc file #40

Open stufraser1 opened 3 years ago

stufraser1 commented 3 years ago

Propose that OED has a field added to record the number of occupants at a location to support use of OED to produce human-focussed outputs - number of people affected, number of fatalities. Ideally, this addition would have supporting type fields to define the age_group, sex, ethnicity, and nighttime/daytime occupancy, to support more detailed breakdown of risk to people, but this may lead multiple rows per location to reflect the breakdown.

Accepting that most OASIS models may not support the estimation of fatalities / number of people located in damaged buildings yet, there is call for this, and at present workaround is to place the number of occupants in BuildingTIV. OED should take a lead on enabling the data to be recorded in the exposure dataset, with the aim that this will enable modelling of these fields explicitly, not through the workaround.

MattDonovan82 commented 3 years ago

@stufraser1 Can be you specific as to exactly what fields/data you think are required? The easy and temporary approach could be to use the current 'Flexi' fields in OED but I think more effort and development will be required to accommodate the humanitarian exposure data permanently in OED.

Depending on the number of fields you think are required, it may be worth considering a separate input file that could link to the loc file (possibly via a unique ID)? Like you say, there are currently no models in Oasis that uses this data so how its supported in OED depends on how you think it will be used.

Also, considering Parquet is the format we are moving to, it maybe able to be included in the one proposed 'package'?

@benhayes21 @johcarter any ideas?

johcarter commented 3 years ago

I understand some human loss models like influenza pandemic and terrorism use mortality rate as the measure of damage. If we were to have a number of occupants field then the number of deaths could be computed as mortality rate * number of occupants, which has a symmetry with the economic damage/loss method for property. This would be a good start I think. As regards the other fields, we could add them when we understand what vulnerability characteristics are required by different types of human loss models.

stufraser1 commented 3 years ago

As @johcarter says - the most straightforward and highest priority would be to add a field e.g. 'numOccupants' so that number of occupants can be assigned to a location.

Models using OED would then need to define a vulnerability curve that related mortality to intensity (per structural vulnerability curves) or estimate mortality from the level of building damage at that location (as is done for seismic risk). Either way, having numOccupants in OED would support the development of that.

@johcarter Does LMF allow vulnerability curves to be related to numOccupants, rather than being linked to TIV fields by default? This would be required.

If that single field was added without the identifiers (day/nighttime, sex, age, disability) then I expect a workaround for not having the identifiers could be to set up a different account for each group as required by certain analyses.

aiste-kalinauskaite commented 3 years ago

There is a NumberOfEmployees field in OED. Does NumberOfOccupants mean the same thing @stufraser1 @johcarter @MattDonovan82 ? If the meaning is the same, but the new name is more clear, then from my perspective it can be updated in the OED. This was included with the thought of having terrorism model or models that require mortality. The other identifiers can be added, but there would need to be a standard on how these are defined. E.g. Shift - is it just the two value - day / night? Age - is that a range of pre-defined ages, or does it allow to enter individual age? Disability - is this "yes" or "no" or a list of pre-defined disabilities (which becomes similar to an occupancy code perhaps?). I am assuming that anyone using that someone using those mortality linked fields will have very few fields from the traditional property modelling, so there should ot be an issue in defining those fields in loc file.

dsokol commented 3 years ago

Agree numberOfOccupants is more clear. I think that the demographics of the occupants should likely be left out in terms of cat modelling, but combining this with Occupancy Code should allow modelers to make good approximations on if it's full at night or in the day. There are too many potential contributing factors I think across the occupancy type (especially when it gets into Industrial) to properly capture a value that would provide meaningful.

Perhaps there may also be a component of it in the policy side (MaximumAnyOneLife) in combination with numOccupants?

johcarter commented 3 years ago

Not at present, @stufraser1, the use of the TIV fields are to produce loss by multiplying by damage and this is hardcoded in OasisLMF. This part would need to be developed, but is not difficult. "Does LMF allow vulnerability curves to be related to numOccupants, rather than being linked to TIV fields by default? This would be required."

Note that no core LMF development would be needed to match exposure data to a mortality vulnerability curve, as the model developer decides which OED fields define vulnerability and writes the key service code to plug into the system.

stufraser1 commented 3 years ago

@aiste-kalinauskaite you make a great point -- The field NumberOfEmployees does the same thing, but the meaning is clearer with 'occupants' because most modelling of human impact outside of the industry uses the nighttime resident population (derived from census data), not only employees in commercial properties. A change in name would mean the field is recognised as appropriate for both cases.

Modelling population in different age ranges / disability groups is not yet performed with enough regularity to be sure about what values would be most useful. For age, ranges would be most appropriate based on my experience - but those ranges could change in different contexts. Generally for population I have seen day versus night (but some 'edge cases' exists where summer versus winter scenarios might be used, e.g., in tourist hotspots). Disability: I've seen able-bodied / disabled without further breakdown, but again not enough to know what useful values might emerge in this area.

Very happy to see the team recognises a way forward on this, and that it doesn't seem too difficult to achieve with what has already been coded.

MattDonovan82 commented 3 years ago

hi @stufraser1 - sorry just picking this up again. So are we all agreed that for a quick resolution on this a 'NumOccupants' field should be added to OED with a supporting description? We can include this in v2.0.0 unless any push back?

The inclusion of the extra fields for demographic data for humanitarian uses is a larger and separate topic.

stufraser1 commented 3 years ago

Yes that would be a good first step, though I think the additions should follow not too far behind. Several research projects looking to make use of additional demographic factors in OED and possibly in LMF modelling environment coming up.

On Thu, 2 Sept 2021 at 11:43, MattDonovan82 @.***> wrote:

hi @stufraser1 https://github.com/stufraser1 - sorry just picking this up again. So are we all agreed that for a quick resolution on this a 'NumOccupants' field should be added to OED with a supporting description?

The inclusion of the extra fields for demographic data for humanitarian uses is a larger and separate topic.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OasisLMF/OpenDataStandards/issues/40#issuecomment-911523209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7PNYXYZFBF4CTGVX6PVATT75IM5ANCNFSM47UKOXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

MattDonovan82 commented 3 years ago

@stufraser1 It makes sense that we add the additional fields in v2.0.0 aswell then. Can you please send over the fields you think need to be included and we will look to getting them added to the OED schema.

thanks.

MattDonovan82 commented 3 years ago

@stufraser1 after some discussion, we thought it might be better to get all the required data fields for humanitarian uses together in a separate input file? This might be cleaner and easier to use for those models/tools you mention rather than adding to the existing loc file in OED.

It would also be ok to repeat any fields needed that already exist in OED such as street, city, state etc used for location. Are you ok to put this file together for review and include all the fields you think will be required? We can then name it accordingly and house it in the ODS repo with some commentary.

Matt

stufraser1 commented 3 years ago

Initial Proposal for review contained in the attached xlsx., comprising the addition of an occupantPeriod field in Location table to support NumOccupants, accompanying code values for occupantPeriod, and a new table to contain the optional more-detailed breakdown of occupant numbers. Readme is also included in the workbook.

HumanImpact_OEDAdditions_Sept2021.xlsx

stufraser1 commented 3 years ago

Small but crucial update in this version 2: additional row UID required. HumanImpact_OEDAdditions_Sept2021_v2.xlsx

johcarter commented 3 years ago

the addition being the new 'LocPopNumber'

MattDonovan82 commented 3 years ago

@stufraser1 apologies as I've just read your notes. To confirm, you're proposing to add one new field to the current loc file ('OccupantPeriod') and changing 'NumEmployees' to 'NumOccupants'? All other new fields will go into the new 'LocPopulation' file?

We may need the Steer Co to review the change of 'NumEmployees' to 'NumOccupants' as this may possibly cause issues to those already using OED, although unlikely.

Is there a reason why the new 'OccupantPeriod' will only be in the 'Loc' file and not the 'LocPopulation' file aswell like 'NumOccupants' will be?

stufraser1 commented 3 years ago

I think 'OccupantPeriod' should be in the same file as 'NumOccupants' because then you can define the number of occupants, and whether they are daytime employees or night-time residents with only the one location file. This seems the most likely use case for most insurance users.

The LocPopulation file then only comes into play when there is a more detailed breakdown required. For completeness you're probably correct the 'OccupantPeriod' could be repeated in LocPopulation, but I would not omit it from Location, since then it would make LocPopulation a requirement just to define the OccupantPeriod.

A consideration in repeating those 1 or 2 fields in LocPopulation: Does the number of fields repeated in both files put an extra burden on users to make sure they match? Does the model validate that the numbers match in both, or does one file take precedent over the other if there is a mismatch? Equally, how to handle the potential issue of the total population per location not equalling the sum of all classification fields for that location?

MattDonovan82 commented 3 years ago

@stufraser1 I would add 'OccupantPeriod' and 'NumOccupants' to both files.

Repeating these in both files shouldn't be a problem but in what case do you forsee the user needing both files? Please correct me if I have not understood this properly, but if a user is only populating and using the 'NumOccupants' and 'OccupantPeriod' fields then they will only need the 'loc' file. If they are doing a more detailed analysis and using the additional fields for population breakdown then they will only need to use the LocPopulation file, is it one or the other not both?

For property modelling, the portfolio and account numbers appear in both the 'loc' and the 'account' files and need to be identical for Oasis to know what account details correspond to the correct exposures.

aiste-kalinauskaite commented 3 years ago

Could you please clarify why there is a need to have two files rather than putting all fields in Loc file? Currently any OED field if it's not needed, doesn't have to be present in the file. Wouldn't that work for having all fields in Loc file only?

MattDonovan82 commented 3 years ago

@aiste-kalinauskaite we thought having a separate file for population info/humanitarian use cases would be cleaner than overloading the current loc file with more fields? Oasis, currently will not be using this information as there are no models that would utilise this data.

Do you not agree?

MattDonovan82 commented 3 years ago

@aiste-kalinauskaite we will discuss this at the next Steering committee on 20th Sept.

MattDonovan82 commented 3 years ago

After more discussion, it makes sense to change the thinking around what OED is. OED should cover "all" exposure data and so can include several input files. i.e the current 'loc' file is the property OED file, the LocPopulation file is another OED file, any other lines of business (such as liability) is another OED file, etc.....

stufraser1 commented 3 years ago

Could you please clarify why there is a need to have two files rather than putting all fields in Loc file? Currently any OED field if it's not needed, doesn't have to be present in the file. Wouldn't that work for having all fields in Loc file only?

Good question -- I tried to explain it in the readme, as I considered the case where one location might have multiple locPopulation rows, which could make location files unacceptably large (not clear how frequently it would occur though). Actually, the way the fields are set up doesn't require this at this point in time (but may become an issue in future).

Take the case that I have one location with 100 people. 20 are under 5, 30 are over 65. This would be described in a single row, in multiple fields, since the available fields define these classifications. Similarly, if I have one location with 100 people. 20 are classified as having a disability. This would be described in a single row, in multiple fields. However, if at some point we wanted to add a code for disability, we might want to describe 10 people with mobility related disability, and 10 people with mental disability. Then we would have two rows for one location.

I acknowledge though that this level of data may be some way off in time and won't be the majority of use cases for a long time. Hopefully that makes clear the thinking for ODS to consider the best implementation.

stufraser1 commented 3 years ago

After more discussion, it makes sense to change the thinking around what OED is. OED should cover "all" exposure data and so can include several input files. i.e the current 'loc' file is the property OED file, the LocPopulation file is another OED file, any other lines of business (such as liability) is another OED file, etc.....

So separate loc files could be included for transport infrastructure, energy infrastructure, communications infrastructure? Once the mechanism is defined, IDF RMSG could assist to develop these out with the assistance of public sector partners - for instance leveraging previous development sector consideration of exposure standards (Risk Data Libvary, GED4ALL) and promoting interoperability with those.

MattDonovan82 commented 3 years ago

yes I think so. For example the liability standard is nearly ready and this will be a separate OED file. I forsee these files being able to be interoperable in the future if required and all under 'OED'. We will of course put this to the SC.

MattDonovan82 commented 2 years ago

@stufraser1 what do you suggest for data type and details when capturing OccupantPeriod?

'day/night'? 'time of day?

stufraser1 commented 2 years ago

Day/night would be the main use case yes. Most analysis uses nighttime population, given by census statistics.

Potential for further description is useful longer term - have previously used data describing tourist numbers (High, low, shoulder season; or summer/winter) but that is perhaps more limited use case. Defining day/night as the only allowed values may be too restrictive long-term. Can string definition be provided to allow more options, thinking also beyond OED data being used in cat models.

aiste-kalinauskaite commented 2 years ago

Perhaps it's worth treating the field in the same way as many secondary modifiers are? E.g. defining a numeric value and assigning what the description is? In the simplest case that would be, say 0 = No, 1 = Yes. Having numeric value with a predefined list of what it means would avoid typos in the text & whether lower/upper cases are accepted.

MattDonovan82 commented 2 years ago

This seems like a sensible approach and would cover most near and long term use cases.

@stufraser1 do you want to come up with a list of descriptions this week and then we can get this into v2 which is being released on 1st Nov?

stufraser1 commented 2 years ago

How about [Day, Night, Peak Season, Off-peak Season] (not using winter/summer, because for some locations peak will be summer, for others it will be winter)

MattDonovan82 commented 2 years ago

@stufraser1 do the options for OccupantPeriod below make sense? Do options 5-8 make sense or is this detail you wouldn't require?

1 - Day 2 - Night 3 - Peak Season 4 - Off-peak Season 5 - Day - peak season 6 - Day - Off-peak season 7 - Night - Peak season 8 - Night - Off-peak season

aiste-kalinauskaite commented 2 years ago

Do we need to have 0 - Unknown too?

MattDonovan82 commented 2 years ago

yes, could do. Or blank aswell for unknown?

stufraser1 commented 2 years ago

It would be worth including 1-8 for some cases where that level of detail may be more common in future.

What would unknown default to?

aiste-kalinauskaite commented 2 years ago

There needs to be an option to support the data that contains the detail (options 1 to 8) and do not in the same file. As the user may not have all the granular data available on all the records. What "unknown" defaults to is a decision of the model vendor, like it is with other fields in OED. "Unknown" could be 0 or blank, as we have the same for other secondary modifiers.

MattDonovan82 commented 2 years ago

@stufraser1 I will add what the 1-8 options mean in the data spec under the 'other values' tab but could you add anymore detail around these options?

Can you also please update the the 'Loc_Pop' input file you have proposed ready for release next week. I propose getting all the info updated into the 'Read Me' tab and remove anything entitled 'proposed'. thanks.

stufraser1 commented 2 years ago

Final version, cleaner readme and tables HumanImpact_OEDAdditions_Sept2021_v3.xlsx

MattDonovan82 commented 2 years ago

Thanks @stufraser1 to keep the options consistent, I will make everything start with 'night' rather than 'day' (i.e. 5 will now be 'Night-Peak' rather than 'Day-peak') that ok? so confirmed options will be:

0 - Unknown/default 1 - Night 2 -Day 3 - Peak Season 4 - Off-peak Season 5 - Night - peak season 6 - Day - peak season 7 - Night - Off-peak season 8 - Day - Off-peak season

As you suggested in the file, it will be referred to as the 'LocPopulation' file within OED.

stufraser1 commented 2 years ago

Your proposed order of options looks good

MattDonovan82 commented 2 years ago

closed as included in ODS v2.0.0

MattDonovan82 commented 2 years ago

After further discussion, it seems adding the occupancy fields to the current loc file is more appropriate and would make it easier to use for the public sector. @stufraser1

stufraser1 commented 2 years ago

Further discussion concluded that population exposure could be described in the main Location table, instead of having a separate LocPopulation table. Attached are the field details to add into Location table. Fields have also been added to previous proposed LocPopulation table to reflect a fuller age breakdown and gender categories than previously. OED_LocPopulation.xlsx

On the discussion of potential one-to-many relationships between locnumber and types: the structure currently maintains a single row per locnumber. We only apply a code to occupancyperiod (in theory allowing the daytime population and nighttime population to be described in file for the same locnumber, but across 2 rows. We didn't consider this in our conversation about bringing these data into the Location table.

Please also clarify does this structure splitting age across fields (instead of with a code across multiple rows) allow us to apply a different vulnerability curve for each age group, or to ethnic minorities versus refugees if we wanted to model a different fatality rate per group, or not? This is not commonly required, but may be a consideration for taking a multiple-row code-based approach instead of this split by fields.

MattDonovan82 commented 1 year ago

Closed as included in OED v3.0.0

aiste-kalinauskaite commented 7 months ago

There are 5 fields missing when compared between OED_LocPopulation.xlsx and OED v3 spec: image 2 user defined fields are unnecessary as there are already 5 user defined fields in loc file. However, the other three should be added: OccupantAge5to65, OccupantMale, OccupantOther.

MattDonovan82 commented 6 months ago

I will add these missing fields info the minor release v3.2. We can remove the user defined fields in the major update for v4.

MattDonovan82 commented 5 months ago

Correction - the OccupantUserDefined fields are not in the spec and so dont need to be added

MattDonovan82 commented 5 months ago

closed as implemented in v3.2.0

MattDonovan82 commented 4 months ago

@stufraser1 can you confirm whether the 'LocPopNumber' field is still needed to be included in the spec? Now the LocPopulation file is part of the OED loc file, this looks likes a hangover from when the files were separate that linked the population to the property data? Now they are integrated, does the existing LocNumber field suffice?

stufraser1 commented 4 months ago

@stufraser1 can you confirm whether the 'LocPopNumber' field is still needed to be included in the spec? Now the LocPopulation file is part of the OED loc file, this looks likes a hangover from when the files were separate that linked the population to the property data? Now they are integrated, does the existing LocNumber field suffice?

'LocPopNumber' field can be removed - you're correct it was used because the file was separate but no longer needed.