Static Measurements - Githubissues

mmcdermott commented 9 months ago

I recommend we move static measurements as a separate measurements list within patients, rather than relying on them within events.

This would make the schema look more like it did originally, like this:

This static_measurements field would reflect variables recorded at a per-patient level in the data without a timestamp. This makes it easier to do any temporal operations on the data, better reflects the conceptual division of data in the dataset, and it is trivial to transform the data to put static measurements into an event if that is preferred by a modeler.

rvandewater commented 9 months ago

I find this a good suggestion as it allows users to have it "both ways".

tompollard commented 9 months ago

It would be good to document the [edit: other] options that we've considered here. I think these include:

1. Events with a null timestamp are considered "static events".

Allows unified structure for all events, regardless of whether they are static or dynamic.
Requires filtering of events to identify static measurements.
Not especially clear for users.

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

2. Add `is_static` flag to the event schema:

Allows unified structure for all events, regardless of whether they are static or dynamic.
Requires filtering of events to identify static measurements.
Typically there will only be a few static measurements, so lot of redundancy.

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("is_static", pa.bool_()), 
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

3. Add `metadata` field to the patient schema that supports key-value pairs:

Similar to the static_measurements approach in the original post
?

metadata_value = pa.struct([
    ("text_value", pa.string()),
    ("numeric_value", pa.float32()),
    ("datetime_value", pa.timestamp("us")),
])

metadata = pa.map_(
    pa.string(),
    metadata_value
)

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("metadata", metadata),
    ("events", pa.list_(event)),
])

4. Define the kind of `static_measurements` we support in the data structure

Simple for the user to understand
Inflexible

e.g. if static_measurements just means demographics, then:

demographics = pa.struct([
    ("gender", pa.string()),
    ("race", pa.string()),
    ("birth_date", pa.date32()),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("demographics", demographics),
    ("events", pa.list_(event)),
])

mmcdermott commented 9 months ago

What about the approach in the former screenshot; just have static_measurements or some othe name just be a list of measurements, not a separate typed struct?

tompollard commented 9 months ago

Sorry, the options that I listed were intended to be "other options".

mmcdermott commented 9 months ago

Ahh, makes more sense, sounds good.

On Thu, Feb 15, 2024, 12:08 PM Tom Pollard @.***> wrote:

Sorry, the options that I listed were intended to be "other options".

— Reply to this email directly, view it on GitHub https://github.com/Medical-Event-Data-Standard/meds/issues/12#issuecomment-1946616281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS5XZB7U62L7WQKT34UUTYTY6JLAVCNFSM6AAAAABDGYUD52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWGYYTMMRYGE . You are receiving this because you authored the thread.Message ID: @.***>

tompollard commented 9 months ago

I kind of like option 3 (the metadata option), though I think at some point there was a metadata field on the measurements schema? If this is still there, it would get confusing.

mmcdermott commented 9 months ago

For simplicity I think the prior approach (just have static_measurements) makes the most sense -- static data generally is also a set of codes and values, it just lacks timestamps, so this reflects that without introducing more schema bloat. We also do have metadata within the measurements that can be defined on a per-dataset basis (or at least that is my understanding) so I don't think we want to go that route for static data too.

tompollard commented 9 months ago

Ok, vote cast on Slack!

Medical-Event-Data-Standard / meds

Static Measurements #12

1. Events with a null timestamp are considered "static events".

2. Add `is_static` flag to the event schema:

3. Add `metadata` field to the patient schema that supports key-value pairs:

4. Define the kind of `static_measurements` we support in the data structure

Medical-Event-Data-Standard / meds

Static Measurements #12

1. Events with a null timestamp are considered "static events".

2. Add is_static flag to the event schema:

3. Add metadata field to the patient schema that supports key-value pairs:

4. Define the kind of static_measurements we support in the data structure

2. Add `is_static` flag to the event schema:

3. Add `metadata` field to the patient schema that supports key-value pairs:

4. Define the kind of `static_measurements` we support in the data structure