globaldothealth / outbreak-schema

Global.health Day Zero Outbreak schema
2 stars 0 forks source link

Format of age as a range #23

Open sadiekelly opened 7 months ago

sadiekelly commented 7 months ago

WHO 5 year age groupings Code AGEGROUP (Global health observatory) https://apps.who.int/gho/data/node.metadata.AGEGROUP?lang=en; AGE0-4 | AGE5-9 | AGE10-14 | AGE15-19 | AGE20-24 | AGE25-29 | AGE30-34 | AGE35-39 | AGE40-44 | AGE45-49 | AGE50-54 | AGE55-59 | AGE60-64 | AGE65-69 | AGE70-74 | AGE75-79 | AGE80-84 | AGE85-89 | AGE90-94 | AGE 95-99 | AGE100+ Is additional granularity required for the 0-4 age group? e.g. 0 to <6 months, 6 months to < 1 year, 1 to <2 years, 2-4 years?

abhidg commented 6 months ago

Will need mapping criteria from other age bins (such as 10-20) to our age groups

sadiekelly commented 6 months ago

Suggest stay with 5 year ranges where possible, categorised per the WHO defined terms above. Other WHO defined categories could be used where source data does not map to the 5 year age groups. e.g. AGE15-24, AGE25-34, however data may only then be aggregated at the least specific category (AGE15-19, AGE20-24 would be combined under AGE15-29). Otherwise map to lowest or highest 5 year age category. A variable could be added to indicate infant (<1 year) where AGE0-4 is relevant, or use AGE0year and AGE1-4 to further categorise the AGE0-4 grouping.

abhidg commented 5 months ago

Questions to consider: how do we ingest datasets that have wider age buckets, such as 10 years, or slightly differing age buckets (off-by-one as in our previous line lists, with buckets for 0, 1-5, 6-10, ...)

sadiekelly commented 5 months ago

@aimeehan1 @JacqSauer @kelseytoups

JacqSauer commented 5 months ago

In some outbreaks, information reported on age is not always consistent. Some reports may provide us with an exact age, in which curators can place the case(s) into age buckets, usually following the WHO 5-year grouping suggestion. For example, some of the earliest reports from the Ministry of Health in the 2023 MVD outbreak in Equatorial Guinea report the age of each case (see health alert no. 3, page 7). When we had this information, we placed individuals into corresponding 5-year age buckets (see Eq Guinea Marburg linelist, case IDs 1-30, columns N and O).

However, later reports from the Ministry of Health (same source reporter) only provided aggregate case information and would group cases into larger 15-year age buckets (see MVD Epi Update, slide 6). If you look at case IDs 31-43 in the linelist, most cases have their age buckets listed as 'NK,' as it was hard to distinguish which age bucket the new cases fell into. If we found supporting sources with more information on individual cases, we would then go back and place the added cases into the appropriate age bucket (see case ID 38-40).

abhidg commented 5 months ago

I suggest doing ageRange.low and ageRange.high as a FHIR Range datatype: https://build.fhir.org/datatypes.html#range that way we can capture arbitrary age brackets. We can still standardise to 5 or 10 year buckets but that allows us flexibility when a particular country only reports in non-standard buckets. It is also quite usual to have buckets of <18, 18-65 and 65+ in many analyses.

sadiekelly commented 5 months ago

<18, 18-65 and 65+ can be captured in WHO age categories YEARSLESS18 (<18 years), AGE18-65, and AGE65-100 (nothing for greater than years)

aimeehan1 commented 5 months ago

Interesting idea to use FHIR low/high range.

Let's think about exceptions to the rule.

Also --- should we be capturing other categories/descriptors for age? For example, capturing MEAN, MEDIAN statistics.

Here's an mpox example that covers two age range groups and source provides the MEDIAN case data. (All confirmed cases as of 20 June 2022 at 14:00 are male, aged between 19 and 71 years (median age: 34 years). Report