Medical-Event-Data-Standard / meds

Schema definitions and Python types for Medical Event Data Standard, a standard for medical event data such as EHR and claims data
Apache License 2.0
38 stars 3 forks source link

Static Measurements #12

Closed mmcdermott closed 9 months ago

mmcdermott commented 9 months ago

I recommend we move static measurements as a separate measurements list within patients, rather than relying on them within events.

This would make the schema look more like it did originally, like this: image

This static_measurements field would reflect variables recorded at a per-patient level in the data without a timestamp. This makes it easier to do any temporal operations on the data, better reflects the conceptual division of data in the dataset, and it is trivial to transform the data to put static measurements into an event if that is preferred by a modeler.

rvandewater commented 9 months ago

I find this a good suggestion as it allows users to have it "both ways".

tompollard commented 9 months ago

It would be good to document the [edit: other] options that we've considered here. I think these include:

1. Events with a null timestamp are considered "static events".

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

2. Add is_static flag to the event schema:

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("is_static", pa.bool_()), 
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

3. Add metadata field to the patient schema that supports key-value pairs:

metadata_value = pa.struct([
    ("text_value", pa.string()),
    ("numeric_value", pa.float32()),
    ("datetime_value", pa.timestamp("us")),
])

metadata = pa.map_(
    pa.string(),
    metadata_value
)

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("metadata", metadata),
    ("events", pa.list_(event)),
])

4. Define the kind of static_measurements we support in the data structure

e.g. if static_measurements just means demographics, then:

demographics = pa.struct([
    ("gender", pa.string()),
    ("race", pa.string()),
    ("birth_date", pa.date32()),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("demographics", demographics),
    ("events", pa.list_(event)),
])
mmcdermott commented 9 months ago

What about the approach in the former screenshot; just have static_measurements or some othe name just be a list of measurements, not a separate typed struct?

tompollard commented 9 months ago

Sorry, the options that I listed were intended to be "other options".

mmcdermott commented 9 months ago

Ahh, makes more sense, sounds good.

On Thu, Feb 15, 2024, 12:08 PM Tom Pollard @.***> wrote:

Sorry, the options that I listed were intended to be "other options".

— Reply to this email directly, view it on GitHub https://github.com/Medical-Event-Data-Standard/meds/issues/12#issuecomment-1946616281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS5XZB7U62L7WQKT34UUTYTY6JLAVCNFSM6AAAAABDGYUD52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWGYYTMMRYGE . You are receiving this because you authored the thread.Message ID: @.***>

tompollard commented 9 months ago

I kind of like option 3 (the metadata option), though I think at some point there was a metadata field on the measurements schema? If this is still there, it would get confusing.

mmcdermott commented 9 months ago

For simplicity I think the prior approach (just have static_measurements) makes the most sense -- static data generally is also a set of codes and values, it just lacks timestamps, so this reflects that without introducing more schema bloat. We also do have metadata within the measurements that can be defined on a per-dataset basis (or at least that is my understanding) so I don't think we want to go that route for static data too.

tompollard commented 9 months ago

Ok, vote cast on Slack!