Closed docsteveharris closed 1 year ago
I'd recommend a single giant file (since it saves the user having to coordinate and manage different files). Perhaps not CSV though. Most people I know seem to be using parquet now? It plays nicely with spark/r/python etc? Saves us tripping up over typing issues?
One big file better than separate ones, but ? One row per obs, rather than column per obs.
SO
Meas_Type Patient Visit Age Gender Ethnicity DateTime_measurement Value
Temp 1 1 80 M W 2023-01-20 14:00 37
Temp 1 1 80 M W 2023-01-21 14:00 37
Temp 1 1 80 M W 2023-01-22 14:00 37
pH 1 1 80 M W 2023-01-20 14:00 6
pH 1 1 80 M W 2023-01-23 12:00 5
Temp 2 1 60 F B 2023-01-19 14:00 37
pH 2 1 60 F B 2023-01-19 09:00 6
presumably we'll need the standard arrangement of a separate column for string/numeric/datetime measurements etc (as per visit_observation in emap or any of the OMOP tables
closing; will plan to move a readme with the data when it arrives in the DSH
As per note from Sarah
Option1: GIANT CSV
Option 2: CSV PER TYPE
e.g. temperature.csv
pH.csv
Option 3: FOLDER PER PATIENT WITH patient.csv Patient Visit Age Gender Ethnicity
1 1 80 M W
temperature.csv