Closed mmcdermott closed 9 months ago
I find this a good suggestion as it allows users to have it "both ways".
It would be good to document the [edit: other] options that we've considered here. I think these include:
measurement = pa.struct([
("code", pa.string()),
("numeric_value", pa.float32()),
("text_value", pa.string()),
("datetime_value", pa.timestamp("us")),
])
event = pa.struct([
("time", pa.timestamp("us")),
("measurements", pa.list_(measurement)),
])
patient = pa.schema([
("patient_id", pa.int64()),
("events", pa.list_(event)),
])
is_static
flag to the event schema:measurement = pa.struct([
("code", pa.string()),
("numeric_value", pa.float32()),
("text_value", pa.string()),
("datetime_value", pa.timestamp("us")),
])
event = pa.struct([
("time", pa.timestamp("us")),
("is_static", pa.bool_()),
("measurements", pa.list_(measurement)),
])
patient = pa.schema([
("patient_id", pa.int64()),
("events", pa.list_(event)),
])
metadata
field to the patient schema that supports key-value pairs:metadata_value = pa.struct([
("text_value", pa.string()),
("numeric_value", pa.float32()),
("datetime_value", pa.timestamp("us")),
])
metadata = pa.map_(
pa.string(),
metadata_value
)
patient = pa.schema([
("patient_id", pa.int64()),
("metadata", metadata),
("events", pa.list_(event)),
])
static_measurements
we support in the data structuree.g. if static_measurements
just means demographics, then:
demographics = pa.struct([
("gender", pa.string()),
("race", pa.string()),
("birth_date", pa.date32()),
])
patient = pa.schema([
("patient_id", pa.int64()),
("demographics", demographics),
("events", pa.list_(event)),
])
What about the approach in the former screenshot; just have static_measurements
or some othe name just be a list of measurements, not a separate typed struct?
Sorry, the options that I listed were intended to be "other options".
Ahh, makes more sense, sounds good.
On Thu, Feb 15, 2024, 12:08 PM Tom Pollard @.***> wrote:
Sorry, the options that I listed were intended to be "other options".
— Reply to this email directly, view it on GitHub https://github.com/Medical-Event-Data-Standard/meds/issues/12#issuecomment-1946616281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS5XZB7U62L7WQKT34UUTYTY6JLAVCNFSM6AAAAABDGYUD52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWGYYTMMRYGE . You are receiving this because you authored the thread.Message ID: @.***>
I kind of like option 3 (the metadata option), though I think at some point there was a metadata field on the measurements schema? If this is still there, it would get confusing.
For simplicity I think the prior approach (just have static_measurements
) makes the most sense -- static data generally is also a set of codes and values, it just lacks timestamps, so this reflects that without introducing more schema bloat. We also do have metadata within the measurements that can be defined on a per-dataset basis (or at least that is my understanding) so I don't think we want to go that route for static data too.
Ok, vote cast on Slack!
I recommend we move static measurements as a separate measurements list within patients, rather than relying on them within events.
This would make the schema look more like it did originally, like this:
This
static_measurements
field would reflect variables recorded at a per-patient level in the data without a timestamp. This makes it easier to do any temporal operations on the data, better reflects the conceptual division of data in the dataset, and it is trivial to transform the data to put static measurements into an event if that is preferred by a modeler.