Closed clairblacketer closed 4 days ago
In CDM v6.0 it has been proposed to remove the date fields and replace them with datetime fields.
Would v6 be released in the beginning of 2018? There are many downstream systems that reference the date field currently. We would want to include @fdefalco and others about it's impact on Atlas and ACHILLES.
That is the goal, I put the proposal here because there has been a lot of discussion around it in the workgroup though it is entirely possible that it will change. I just don't want to lose track of any proposals so this is certainly not set in stone.
That makes sense.
On another note, it was just pointed out to me that there are not datetime fields in the era tables. This was an oversight in the proposal even though it was discussed that all date fields would become datetime. I will add an issue to make that change.
Actually, we are not sure if that was an oversight or on purpose. Given the Oracle conflict, I wonder if we should hold off on those fields. I don’t think those fields would be use very much in the era tables. But I agree they need to get in there for the switchover to all datetime without date.
From: cukarthik [mailto:notifications@github.com] Sent: Wednesday, June 21, 2017 9:03 AM To: OHDSI/CommonDataModel Cc: Subscribed Subject: Re: [OHDSI/CommonDataModel] Replace Date fields with DateTime fields (#61)
That makes sense.
On another note, it was just pointed out to me that there are not datetime fields in the era tables. This was an oversight in the proposal even though it was discussed that all date fields would become datetime. I will add an issue to make that change.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/OHDSI/CommonDataModel/issues/61#issuecomment-310071468, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGRVXtTb1GhJYKjmNWYz9eW7nmugKoxEks5sGRSIgaJpZM4OA406.
I agree with @hripcsa - Decision 2 above was the basis for the changes made to v5.1.0. It does not reference the ERAs so datetimes were not added to those tables. @cukarthik you can still log the issue as a proposal if you think it could be worthwhile and we can bring it up in discussion at the next workgroup meeting
It also did not include the OBSERVATION_PERIOD table, which was the source of the Oracle error. So we should remove _DATETIME fields from OBSERVATION_PERIOD from the DDL.
On Wed, Jun 21, 2017 at 9:16 AM, clairblacketer notifications@github.com wrote:
I agree with @hripcsa https://github.com/hripcsa - Decision 2 above was the basis for the changes made to v5.1.0. It does not reference the ERAs so datetimes were not added to those tables. @cukarthik https://github.com/cukarthik you can still log the issue as a proposal if you think it could be worthwhile and we can bring it up in discussion at the next workgroup meeting
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OHDSI/CommonDataModel/issues/61#issuecomment-310074941, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsrGns0VIVY0X2GejwyR9s-Upg4S9Y1ks5sGReugaJpZM4OA406 .
PR #64 created to address this, will be pulled in to v5.1.1
Version 6.0: We have no plan, yet. It's a huge deal. The whole community is going to start squeaking if we start using the S-word. :)
what is the status as of Dec 2016?
Sorry for being very late to the party. I'm trying to understand the motivation for why we've now effectively dropped DATEs everywhere in v6 in favor of DATETIME.
The reason I bring this up is because I have to implement this in our R packages. For various reasons I can't just use DATETIME instead of DATE everywhere, because a lot of logic revolves around dates being the day level. What I'll need to do is cast all the DATETIME fields back to DATE.
So we have 99.99% of our data at the DATE level, and once I'm done our analytics packages will ignore any finer granularity, but we just added a lot of compute (and disk space) overhead, as well as invest a significant amount of developer resource.
I'm just trying to understand the reasoning why. Anyone?
The reasoning is the following: There really is a datetime continuum, so operating on the date level or datetime level is the same thing. In the backend, it's integers.
Reason people want it is some use cases have a sub-day granularity. Everything in the ER is within a day, and then you want to know if you got the cardiac arrest before that injection of a drug, or is the arrest the result of the injection. On the date level you cannot do that.
Bottom line: No harm and a lot of use. The fact that the transition is not free had been considered through a looooong transition time where we have both date and datetime. Of course, that time had to be used. :):)
As a compromise we can easily extend the date fields for a while longer, and the casting happens at the ETL level. But eventually you need to be able to support the sub-day use cases anyway.
I agree that developer concern is important factor to consider. In favor of minutes/hours is the fact that some procedures/medication (events) in medicine are on hourly scale.
We can have two modes of analysis - my study is on day level vs my study needs sub day granularity. And based on mode, the cheaper computation is invoked
Being late to the party also, coming back on what you said Martjin "I'm trying to understand the motivation for why we've now effectively dropped DATEs everywhere in v6 in favor of DATETIME."
For my part I truly see a value in bringing DateTime within the CDM giving that all costs study associated with procedures events are analysed through direct costs which are attributed with specific value per event and then implemented with additional modifier (Which can be on an hourly basis). In that. case date is sufficient. But pushing a bit the potential analysis forward, we can think that we would at one point bring up indirect cost of a treatment sequence (OR Time for example) which require DATETIME.
Do not known if it makes sense or if I am totally out of subject.
Thank! Not yet fully convinced though. Disagreeing with @cgreich on
so operating on the date level or datetime level is the same thing
It is not. For example, if a patient starts a drug on day 1, with length of exposure = 7 days, my convention is currently to consider them exposed on days 1 - 7, so 7 days in total and including day 1 completely. In contrast, at a DATETIME level if the exposure is coded to start at noon, then adding 7*24 hours brings me to noon of day 8. If an outcome occurs in the morning of day 8, is it during exposure or not?
One other disadvantage of dropping DATEs is paradoxically a loss of information. We used to know that most of our data was at DATE level, now it seems everything is at DATETIME level.
An alternative approach could be:
That way, date-level analyses use the DATE fields, and hour/minute/second-level analyses use the DATETIME fields. The latter won't be confused by all the date-level data that pretend to be DATETIME.
@schuemie:
Not a lot of disagreement there. In fact, the two issues were discussed at length. Both the day one issue as well as the hardship with conversion. The first one should be fixed by using midnight of the day, and the second through an extensive overlap period to give the community time to adjust. Which is what we are talking here.
I know it sounds like a glib "too late, buddy", but this really had been noodled a lot with a whole bunch of people, @schuemie. So, let's see how we can figure this out.
An alternative approach could be
When this alternative was discussed the counter arguments were:
1) You want to avoid duplication of information in RDBS, because you will inevitably run into contradictions. 2) It takes up silly space and makes the model kludgy.
Would you consider updating, maybe over time, your logic to also work with less then daily resolution? Right now it wouldn't work.
@cgreich , I didn't hear you address my loss-of-information concern though. If we're going to start to use the DATETIME information, for example to determine sequence of events within a single day, we'll have to somehow filter out the date-level information that was coded to occur at midnight, because those things didn't really happen at midnight. Talk about kludgy!
Also, we wouldn't be duplicating a lot of information, just the information that originally is at the DATETIME level. And yes, in those cases they could start to contradict each other, but that is true for a lot of our data elements (e.g any events happen after death?). We just need good internal consistency checks.
I am sorry for reopening this discussion. But I'm also just now starting to realize it might not be a good idea. Surely we're allowed to question past decisions?
here is the another use case for the date time field. https://github.com/OHDSI/Atlas/issues/1980#issuecomment-646363396
I actually do not see an issue of a bit of a duplication and keep both date and date time fields. It is better to do it during ETL than take a hit during analytics. It is a simple logic and this is what data marts are built for. The only thing we need to think about is multiple similar events occurring on the same day (date)
Surely we're allowed to question past decisions?
Absolutely. Let's bring it on. I personally have no horse in this race at all.
sequence of events within a single day, we'll have to somehow filter out the date-level information that was coded to occur at midnight
That's true. I think what folks assumed was that the data are either datetime throughout, or date (with kludgy witching hour), and figuring out things in a mixed situation wouldn't be something to worry about. But you are right, there is no guarantee.
We'll put it on and invite you.
multiple similar events occurring on the same day (date)
Why? They'd have the same date, and different datetimes. All good. No?
We'll put it on and invite you.
Could we have this discussion? This is the most impactful change in v6 for HADES, and I have my concerns.
@schuemie the remit I have proposed for the CDM WG this year is specifically related to this issue. I would like to revisit CDM v6 and make necessary changes to address your concern(s) and increase adoption.
In 2023 this is still an issue to the point where CDM v5.4 was introduced in 2020 to continue to move forward with the model while avoiding the required DATETIME fields.
As a follow-up to this, we need to clarify the documentation to say "if you do not have time, leave the datetime blank"
We are closing this - we tried to replace dates with datetimes in v6.0 and it did not work well with our methods. We will keep BOTH dates and datetimes for the foreseeable future.
While v5 visit_occurrence table supports both date and time of event, other occurrence tables and exposure tables such as drug, condition, and procedure support only the date level. It is desirable to have the option to include specific time data for such occurrences. Our goal is to allow temporal operations finer than day without disrupting OHDSI by requiring major recoding.
Here is the current use of date and time fields in CDM v5.01:
PERSON
SPECIMEN
DEATH
VISIT_OCCURRENCE
PROCEDURE_OCCURRENCE
DRUG_EXPOSURE
DEVICE_EXPOSURE
CONDITION_OCCURRENCE
MEASUREMENT
NOTE
OBSERVATION
And the ERA tables, COHORT tables, and PAYER_PLAN_PERIOD table.
DECISION 1
We have a choice between adding fields of type time, which require significant processing to determine durations (join the date and time, and then operate on that) versus defining a datetime field as timestamp, which allows fast operations but produces redundancy.
[Suggest datetime = timestamp.]
DECISION 2
We propose adding a datetime field or fields to the following tables
We propose replacing the current time fields with datetime fields in the following tables
A datetime field (also known as timestamp field) would be added to each date field. Date fields would not be changed.
This would support handling data from ICU, Emergency Department, infusions, post-procedure care, etc. where multiple events occur on the same day and sequence matters. This granularity would also support the incorporation of data generated from tracking devices.
In addition, current _TIME fields would be removed. At this point, all current software will continue working with the _DATE fields, and over time we will develop extensions to the software to accommodate DATETIME in different database management systems.
[Suggest moving forward with fields called _DATETIME, removing the _TIME fields. The *_DATE fields will remain required.]
DECISION 3
Are the datetime fields required. Making them required allows developers to begin to use them with a potential migration from date+time or date+timestamp to timesteamp in the future, but it forces CDM builders to enter unknown times or timestamps. If required, the default time will be the first instant in the allowable period. E.g., 1990-12-01-00:00:00.000000 is the correct entry for December, 1990.
[Suggest optional for now.]
DECISION 4
Should a date time (timestamp) field be added to BIRTH (or alternatively should PERSON.time_of_birth be changed to timestamp). This will allow more rapid calculation of age but will be redundant with the current information.
[Suggest add BIRTH_DATETIME. Remove TIME_OF_BIRTH.]
DECISION 5
Should we add a *_TIME_GRAN granularity field for each new timestamp field, which would indicate year, month, day, hour, minute, or second as the timestamp's granularity. When a timestamp is filled in, it should be set to the first valid time in that indicated interval.
[Suggest defer decision.]