HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Add smoking related fields in medical_history module #1565

Closed arschat closed 3 weeks ago

arschat commented 2 months ago

For which schema is a change/update being suggested?

I would like to request an update to the medical_history.json schema.

What should the change/update be?

I would like to add three new fields - smoking_status/ smoking_pack_years/ years_since_smoking_cessation - to this schema to allow data contributors to collect precise and measurable information about donor's smoking history. Existing field smoking_history should be removed since there is overlap with smoking_pack_years.

This update constitutes a major change to the schema(s) it affects.

What new field(s) need to be changed/added?

  1. smoking_status

    • Field name: smoking_status
    • Field description: Whether the individual is actively, was formerly or never consumed smoking tobacco products like cigarettes, cigars, pipe, betel nut chewing etc.
    • Field type: string
    • Required: no
    • enum: active; former; never
    • Examples: active; former; never
  2. smoking_pack_years

    • Field name: smoking_pack_years
    • Field description: Estimated number of packs (20 cigarettes) smoked per day multiplied by the number of years the individual was smoking.
    • Field type: number
    • Required: no
    • Examples: 4.55; 12; 49.5
    • CV or enum: no
  3. years_since_smoking_cessation

    • Field name: years_since_smoking_cessation
    • Field description: If smoking status is "former", specify the number of years since smoking cessation.
    • Field type: integer
    • Required: no
    • Examples: 1; 4; 12
    • CV or enum: no

Why is the change requested?

Many bionetworks (currently 5) are interested in recording the smoking history of the donor in a standardised way as part of their Tier 2 metadata.

Although schema already has the medical_history.smoking_history field that records "Estimated number of cigarettes smoked per day.", this is a free text field that we can't standardize and use as a measurable way, and the way of measurement suggested does not take into account the number of years the individual smoked.

Lung bionetwork has proposed this 3 field way of recording this, and we expect these (or at least the status) to be requested for Tier 2 by most of the bionetworks. Since smoking_pack_years is very similar to the existing smoking_history we also suggest this field to be removed.

arschat commented 2 months ago

PR #1567