Closed wistephens closed 7 years ago
Bill: Do you have the correct DDLs?
Yes, the current DDL for Postgres specifies varchar(10) for all time columns.
Specimen example: specimen_time VARCHAR(10)
Oh. I meant to say, do you have DDLs the way you want them, with the type "time"?
George had proposed that we consider making the time fields to have datetime type . The downside is the date portion would be redundant with the date field, but the good part is this would allow a recognized data format that could work cross platform. It seems a reasonable proposal to me and probably better than having a varchar field that isn't strictly regulated. Bill, would that meet your needs if adopted in a future cdm release? We haven't yet seen anyone with a real analytical use case for the time fields yet. On Feb 10, 2015 11:05 PM, "Christian Reich" notifications@github.com wrote:
Oh. I meant to say, do you have DDLs the way you want them, with the type "time"?
— Reply to this email directly or view it on GitHub https://github.com/OHDSI/CommonDataModel/issues/6#issuecomment-73831287.
We've updated our local version of the Postgres DDL to use time data type. This is working with our current ETL process.
Patrick, the specification of separate time field on some tables seems inconsistent. I think using datetime for instead of date on the tables (visit, observation, specimen, measurement, note) allows us to support our scenarios without the need to perform a bunch of cast/converts in the extract.
+1 for datetime values across the model. That's what we've decided (after long conversation) to go with for PEDSnet.
Just a cautionary note on using datetime
everywhere: a lot of our applications rely on the fact that if you subtract two dates you get an integer (representing the number of days). For datetimes the result is an interval
that needs further processing.
I'm sure we can come up with the right SQL and translation rules to make it work, but we're not there yet.
To clarify my comment: I believe we have to maintain all current date fields as date type. not only do all our analytics expect it, but many data sources only contain dates (without times). Many of our dates (without time) are required fields. I was only suggesting that the time fields in cdmv5 could be considered in future releases to be typecast as datetime type. This would not interfere with exisiting data or analytics, since the time fields are all optional, but may enable those with the information to have a standard representation. On Feb 11, 2015 4:37 PM, "Martijn Schuemie" notifications@github.com wrote:
Just a cautionary note on using datetime everywhere: a lot of our applications rely on the fact that if you subtract two dates you get an integer (representing the number of days). For datetimes the result is an interval that needs further processing.
I'm sure we can come up with the right SQL and translation rules to make it work, but we're not there yet.
— Reply to this email directly or view it on GitHub https://github.com/OHDSI/CommonDataModel/issues/6#issuecomment-73970001.
I'm under the impression that PEDSnet members ARE interested in doing time queries. The PCORnet CDM also has fields for storing time data, so I'm assuming they are expecting queries on that information.
We are certainly hoping to take advantage of OHDSI developed applications for the PEDSnet data, but I'm not sure we can give up the datetime
data type, either.
What has to be done to allow for this extension of the OMOP CDM without breaking compatibility with existing tools? I believe I ran into this before with Achilles and adding a line in the SqlRender replacement patterns fixed it.
If you want to store time, you should put it in the time field, that's why it's there (and in fact was added specifically for pedsnet and the other pcori cdrns who asked for it). I would not recommend changing the date field. If you change from the cdm standard to make the date fields to datetime, you will have to modify all analyses that expect dates only.
Cheers,
Patrick On Feb 12, 2015 2:29 PM, "Aaron Browne" notifications@github.com wrote:
I'm under the impression that PEDSnet members ARE interested in doing time queries. The PCORnet CDM also has fields for storing time data, so I'm assuming they are expecting queries on that information.
We are certainly hoping to take advantage of OHDSI developed applications for the PEDSnet data, but I'm not sure we can give up the datetime data type, either.
What has to be done to allow for this extension of the OMOP CDM without breaking compatibility with existing tools. I believe I ran into this before with Achilles and adding a line https://github.com/OHDSI/SqlRender/blob/master/inst/csv/replacementPatterns.csv#L56 in the SqlRender replacement patterns fixed it.
— Reply to this email directly or view it on GitHub https://github.com/OHDSI/CommonDataModel/issues/6#issuecomment-74135791.
Aaron: The CDM has the type "time" for time-containing fields in the specs (e.g. http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:observation). You are totally fine changing this from char() to something that can contain a proper time. Just don't touch the dates. And give us the DDL once you are done, so we can spread the wisdom.
So no time until V5 then (except for observation). I have already seen many identical rows come out of our ETL in visit_occurrence
, and possibly elsewhere, that represent the same event done repeatably within a day. Specifically, I saw multiple imaging visits in the same day for an inpatient that were indistinguishable without time information.
The problem with splitting date and time is with converting between timezones, or to UTC. There's no way to increment or decrement the date field when the time field moves across midnight. So we could store full datetime in the time field and date only in the date field, but that seems excessively redundant.
On the other hand, when you store dates without times in a datetime field, they are usually automatically assigned a time of midnight, which causes other problems for analysis (and again, for timezone shifting).
My preference would be to store everything as datetime (no timezone), store timezone names (olson, which allow for reliable conversion using the tz
database) on location records which can be linked to the clinical data, and then each analysis decides how it wants to cast the data for the intended purpose. But, I don't maintain the analysis packages ;)
Aaron: Wait: You want that for Version 4? We can't fix that one without a time machine. If you want to implement a local version with datetime, and then make sure that you fix it in all tools and applications that assume a date only, you should go ahead. But why not go to V5?
The time zones: Does that really happen within a patient? When does a patient move time zones and is in the midde of an acute treatment where time is of the essence?
As Patrick said: A diff between two dates is expected to be a number of days. And those clauses are all over the place in tons of different places. You don't want to go there. :)
We have a local version of V4 with datetime, so I'm very interested in evaluating what we would need to change. So far, the only OHDSI tool we are using is Achilles.
Not that I know of so far, although one of our sites does straddle a timezone boundary and have care sites on either side. But certainly across patients if you want to execute any sort of "instant in time" query like, I don't know, ER visits after the Super Bowl. I thought it was generally accepted that date and time should be stored together (see "epoch", e.g.)?
It seems to me like this is one place where the "type safety" of databases falls down (certainly between the "same" schema in different DBMSs), so the analysis applications should make sure to cast to what they want.
When I get a chance, I'll take a look at the Achilles code and see how much work it might take to do the casting. I'd be happy to contribute the casting modifications to the community, if you would accept.
Cloning the Achilles code base and using git grep 'end_date\|start_date\|procedure_date\|observation_date'
, I get 304 lines, all in Achilles_v4.sql
and Achilles_v5.sql
.
Most of those look like they wouldn't be affected by a datetime type because they are comparisons, order bys, or part (year, month) extractions.
Doing git grep 'DATEDIFF'
I get only 28 results, where I would have to add CAST(@a AS DATE)
to each of two arguments.
Does that sound right?
Can we make this official in the DDL? (if many sites are doing it anyway?)
Replace in DDL the varchar10 with DATETIME
For example here
CREATE TABLE measurement
(
measurement_id INTEGER NOT NULL ,
person_id INTEGER NOT NULL ,
measurement_concept_id INTEGER NOT NULL ,
measurement_date DATE NOT NULL ,
measurement_time VARCHAR(10) NULL ,
CDM v5.1 release will have both date and datetime fields and there is ongoing discussion in the workgroup meetings when to officially switch to datetime only.
This is related to issues #61 and #60
I'm ETLing data from a MSSQL data warehouse into CDM v5 on Postgres. When I cast the time portion of a timestamp into a TIME, I'm getting strings that are greater than the varchar(10) limit on several CDM v5 tables. The max string length for time appears to be 11 characters.
This caused errors when attempting to load time data.
If we're using the ANSI SQL time formats, as specified in the documentation, the max length of the time fields should be altered to account for the following specified ANSI format: "HH:MM.SS.MMMM", which assumes that we're all using 24 hour time.
Perhaps the time fields should use the "time" data type rather than strings?