dmfay / massive-js

A data mapper for Node.js and PostgreSQL.
2.49k stars 158 forks source link

Support and test date/time datatypes #51

Closed mattjohnsonpint closed 8 years ago

mattjohnsonpint commented 9 years ago

Great work on this library.

I see some tests around testing a timestamptz field with a js Date object in the where clause. But there should also be some testing around the other date/time types supported by Postgres.

In particular, recognize that the JS Date object is internally based on a UTC values (ms since 1970-01-01 UTC), but most of the functions work with time in the local time zone of the computer where the code is running. That means it doesn't map well to Postgres's date type - which might better be expressed as in JS as just an ISO8601 string such as '2015-12-31'.

Also, in the test, you use the value new Date(1980,1,1) - which is a valid date, but it's February first, due to the Date constructor months running 0-11 rather than 1-12. I think you may have meant new Date(1980,0,1)

Other than the where clause, I also recommend testing the results when these types are expressed in the select clause.

"Thar be dragons" applies to date/time like nothing else, and JavaScript is a bad-apple in general in this area. I recommend some strong focus on this now, as it will ward off a flood of user complaints down the road.

robconery commented 9 years ago

Thanks Matt - I don't know if I follow what the issue is here. Postgres does indeed have a number of ways of storing dates and time (timestamptz is what we used for our columns, we could have used UTC).

People have different ways of writing code and handling dates with Javascript.

Massive just hands one thing to the other - your code to Postgres. I don't control how you store your date values, nor how you represent that in code.

What, specifically, is your issue with Massive? Yes, dates are difficult to deal with. At the end of the day we're simply handing Postgres a parameterized value to compare against your data. I don't see anything else we need to do here.

mattjohnsonpint commented 9 years ago

I'll see if I can give you a concrete example shortly. But yes - I understand that Massive is just handing things from one to another. In doing so, I assume it's responsible for mapping types from one system to the other - and that's where the trouble usually lies.

Unless the mapping is already being handled by a dependency?

robconery commented 9 years ago

Yes exactly. The PG driver handles that and Postgres coerces as it needs. For instance you can send in a date in a string, like "Feb 10, 2011" ang PG will parse it with GMT, though I would need to confirm this. I'll add a test when I can, but if you could wrap your thoughts in PR I would really appreciate it.

On Apr 25, 2015, at 00:17, Matt Johnson notifications@github.com wrote:

I'll see if I can give you a concrete example shortly. But yes - I understand that Massive is just handing things from one to another. In doing so, I assume it's responsible for mapping types from one system to the other - and that's where the trouble usually lies.

Unless the mapping is already being handled by a dependency?

— Reply to this email directly or view it on GitHub.

rasantiago commented 9 years ago

I have a good example of this problem. Struggling mightily with it right now.

I have a date field called start_date which is of data type date. When someone creates a new record through our app the date ends up getting set properly. But, when the date is retrieved massive ends up tacking on the default timezone (in our case UTC) so technically it comes back as Midnight UTC of the date. When someone in Pacific timezone pulls up the date it is set to 8 hours prior (the difference between UTC and Pacific) which now makes it look like the date has been moved to the prior day.

In short, Massive is translating data type date into datetime and pushes the date around based on default timezone from the server which can be very different from timezones of the end user.

Not sure what the correct solution to this problem is.

xivSolutions commented 9 years ago

I believe his is a JS problem, and not a Massive-specific issue. I think you need to EITHER perist the data/time with a timezone/offset, OR make sure to save all dates/times as UTC, and then allow JS to apply the local offset.

We had a similar issue at work recently, where we were pushing jsut a date into the backing DB ( in this case, SQL Server) wof course was represented as '2015-11-19 00:00.000'. When the data was retreived via AJAX, and then coerced using new Date(data.date) JS was assuming UTC because there was no TZ/Offset. Therefore, subtracting 6 hours from UTC< and giving us the day prior to the actual date.

See http://stackoverflow.com/questions/7556591/javascript-date-object-always-one-day-off

Not sure if there is a one-size-fits-all solution here that should be implemented within Massive itself - I think no matter which way we were to go, certain use cases would run into some version of this issue.

I could be missing something here, but will dig deeper myself over the weekend....

rasantiago commented 9 years ago

Thanks for the response! So I think the core issue here may be that there is not a pure Date construct to map the Date data type from postgres into. I wonder if we could have an option to choose to map to JS Date (default), JS Date with explicit locale declaration or simply to text. In my case the last one (text) would resolve the issues instantly but that is just the case I am dealing with and do not know how broadly that would apply.

robconery commented 9 years ago

I think there is some confusion about dates here. First, @xivSolutions is correct. Massive doesn't translate anything - it just hands you your data.

When you get a Date from Postgres, a new Javascript Date is created by the driver and in Javascript a Date represents a moment in time. So if you don't specify which moment that is (by using a simple Date type in Postgres) then you will end up with Midnight UTC.

My suggestion for you would be to do whatever translation you need with Postgres, not with Massive or Javascript. Postgres is pretty good with dates and if you need a certain representation, you can do that with a simple query or date function.

swissspidy commented 8 years ago

Just as a follow-up question:

I have a database with a date column using the timestamp data type (i.e. no time zone information stored) and a row with the value 2016-03-20 14:10:01.000+00.

Massive returns that value as Sun Mar 20 2016 14:10:01 GMT+0100 (CET), which results in Sun Mar 20 2016 13:10:01 UTC, i.e. one hour earlier. Isn't that wrong?

Looks like that's a bug (or feature, rather) in the pg module, which always uses the system time zone for dates.

However, you can change this behaviour by setting the parseInputDatesAsUTC option to true. I just don't see a way how to modify pg.defaults via Massive.

See https://github.com/brianc/node-postgres/wiki/pg#pgdefaults for the defaults documentation.

robconery commented 8 years ago

Interesting issue - I would suggest there's nothing wrong at all with the way Massive/Postgres is behaving. Your have a naked date in there with no timezone - the system has no idea what this date represents so it defaults to UTC rather than the server timezone. Yes, you can override this interpretation, but that solves the problem with Massive and Node, not with a reporting tool or some other data access tool you decide to use because Massive is buggy :).

The problem isn't the driver and it's not Massive - you have incorrect data. Honestly mate if the timezone is important to you (and I think you're finding out that it is) - it should be recorded properly (or adjust your input to UTC).

rasantiago commented 8 years ago

So I really love Massive and am very much a fan of the minimalist approach to most things DB. But, I have to respectfully disagree with @robconery. The correct answer, IMHO and with the benefit of 25 years of software engineering, is that Postgres has it correct and the PG module has it wrong (because in some ways JS has it wrong) and this error is simply passed along through Massive.

Here is my reasoning. Postgres has a DATE data type which is different from TIMESTAMP data type and for that matter TIME data type. Each of these data types within Postgres is handled consistently and elegantly. DATE simply holds a date with no care for time or timezone. TIME by default holds a time with no concern for date or timezone. TIMESTAMP data type by default only cares about date and time with no concern for timezone. DATE, TIME and TIMESTAMP can optionally support timezone but it is not required.

The problem happens when we try to map all these Postgres data types into JS objects. All of these data types from Postgres get mapped into the Date Object in JS. The Date object in JS is improperly named since it requires date, time and timezone. Thus, while Postgres will allow us to handle DATE and TIME cleanly without care for timezone, the JS Date object forces us to tack on time and timezone which then causes a number of problems when we put the data back into Postgres. All this makes engineering (and architecting) an application more complicated than needed in my opinion.

The real answer, again IMHO, is for JS to support date and time objects without requirement for timezone. Short of that there are a lot of ways to engineer around the issue. I make use of moment.js and moment-timezone.js to clean up a lot of the issues along with a few custom filters in Angular. This discussion thread may finally give me the last push I need to engineer lightweight Date and Time objects and create an augmentation for PG and/or Massive to cleanly answer these issues.

Lastly, and again with all due respect to you, I must disagree with you again @robconery. @swissspidy is completely correct and in fact is trying to consistently handle the timezone issue in a way that does not corrupt his data. Pushing everything into UTC is one approach to dealing with this issue and it would be nice to set this through Massive.

xivSolutions commented 8 years ago

@rasantiago - IMHO you are correct in stating that "is that Postgres has it correct and the PG module has it wrong (because in some ways JS has it wrong) and this error is simply passed along through Massive."

Beyond this, I disagree. Postgres has it right, in the context of it's own type system. JS most definitely has a funky way of dealing with dates/times/timezones. And, at the end of the day there is a gross impedance mis-match there. The PG driver doesn't "have it wrong" - there was clearly a design decision made which recognizes that there is no one good way to reconcile all this between Postgres and JS (well, that one good way is a whole 'nonther lib, but I digress). So the driver gets out of the way, and hands the data to the client to deal with it.

This is also the approach Massive takes.

"Pushing everything into UTC is one approach to dealing with this issue and it would be nice to set this through Massive."

Agreed, to the extent that we will likely be adding a pgDefaults option to Massive, so that things like this (and connection pooling!) can be set at load.

Note, however, the JS date/time BS and attendant issues are not unique to the PG driver, or Massive. It is my experience that this type of thing rears its ugly head between JS and just about any other back-end which has more advanced/complex date time handling. I've had to deal with it in .NET/SQL Server, Elixir + Postgres, Node + Postgres, etc. Managing Dates/Times/Timezones is already hard, and JS makes it even harder.

Look for an optional pgDefaults object at load coming soon (I think...).

Thanks for the feedback, nice discussion all. :-)

rasantiago commented 8 years ago

@xivSolutions All good points and I agree. My caveat about the PG driver getting it wrong because JS gets it wrong was my ineloquent way of saying what you said much more clearly (i.e. its an impedance mismatch between JS and Postgres)

Totally appreciate the discussion. Many thanks!!

robconery commented 8 years ago

Such civility - how nice :). Disagreements are lovely and always welcome! I just wrote a really long response to this with some nice citations to the docs and I then went back to grab some code from @swissspidy's example when I realized I completely misunderstood the question he was asking.

Duh.

This date (from his question): 2016-03-20 14:10:01.000+00 is a perfectly valid GMT date and his assumption that the return Sun Mar 20 2016 14:10:01 GMT+0100 (CET) was incorrect is understandable. It is incorrect.

When I read the question at first I thought it was "I'm storing a date without a timezone and I'm getting back a date with a timezone what's happening" - completely my fault. Sorry.

Yes - this is the pg module overriding the timestamp, which is bad. Sorry for being thick-headed... I agree we should provide the override in #240.

robconery commented 8 years ago

BTW @rasantiago it's worth noting that....

DATE simply holds a date with no care for time or timezone. TIME by default holds a time with no concern for date or timezone. TIMESTAMP data type by default only cares about date and time with no concern for timezone. DATE, TIME and TIMESTAMP can optionally support timezone but it is not required.

Dates and times are meaningless without some notion of a timezone. Postgres defaults all of these to the local server's timezone (or the TimeZone conf setting). A timezone is required for the storage of the data, but yes you can get away with just using now() and Postgres will figure one out for you.

rasantiago commented 8 years ago

@robconery thank you so much for reopening this and I wish all online technical discussions were this civil.

So there are many applications and use cases where timezone is not necessary. In those cases time and date are meaningful without timezone and it would be nice to not have to deal with timezones for those use cases. Back in the day (dang i'm getting old) I loved Oracle for this reason. I never had to deal with timezone unless I explicitly wanted to.

With respect to Postgres, I think it is important to look at the bits that get written to memory and disk. In the case of the DATE data type, it only uses 4 bytes and has a resolution of 1 day. So there are not enough bits there to store time or timezone. Date is simply just a date and does not have an implied timezone to its definition. But, you are correct that there are a number of situations where Postgres will infer a timezone (e.g. Date/Time math).

robconery commented 8 years ago

If you're doing any calculations/rollups by date, then timezone is never meaningless. You might think it's meaningless now, but in a few years time I would offer that you'll kick yourself.

I actually spent some time yesterday thinking about when I would just want a date without a timezone. Even if I just wanted YEAR, MONTH or DAY - I'd still calculate that from a timestamp with a timezone.

I'll go so far as to say recording a date without a timezone in all cases is invalid. As I write this it's tomorrow in Australia - so on which date did I reply to you? Ask @shiftkey and he'll say "Tuesday the 26th" whereas it's Monday morning the 25th to me.

When would you suggest a date is OK without a timezone?

swissspidy commented 8 years ago

That's a valid point. I'm dealing with various timezone issues daily and know how helpful timezone information can be.

My plan was to store the date in UTC and then display the value in the end user's time zone, i.e. doing the conversion as late as possible.

After my last comment I switched to using the timestamp with time zone data type in Postgres to always store the date with the timezone (even if it's just UTC). Now, the returned date no longer differs from the expected value.

It now seems that I don't need #240 for my use case at the moment, but you never know.

rasantiago commented 8 years ago

@robconery Timezones are generally unnecessary when the scope of concern is inherently local.

There are countless examples. Years ago I helped build a system for a city to track servicing of municipal controlled public areas (e.g. the median strips that need mowing, the traffic circles that have bushes that need trimming, etc.). There is nothing about that system or about the reporting from that system that is not local (i.e. Australians are not a concern). They just want to make sure every area gets serviced every N number of days and to know when any area hasn't.

By way of another example, right now I am working with organizations that offer kids day camps. The dates and hours of the camp are inherently a local concern (i.e. no one is flying their kids in from Australia to attend these camps). All the users of that system are local and always will be. Thus the date and time of camps are not recorded with timezone.

Interestingly, when parents register online we do record the full time/date and tz because we have to coordinate with a credit card processor who is not local. In that latter case, the scope of concern is not local because of the credit card processor.

But, I understand your concern and I do think its worth thinking through when date, time, datetime and timezones are needed.

robconery commented 8 years ago

@rasantiago OK let's go with that :) - the concern is local. Two years from now the union that does the municipal work files a class action suit against your local utility. They hire lawyers from New York to represent them, and these lawyers subpoena the county for all of the scheduling files so they reconcile against payment.

They send the data they receive to their analysis house in Bangalore. It's raw data because no lawyer would ever trust simple reports, and they've directed their analysis team to crunch the data for discrepancies.

You and I know the data is local, but do we know when it was recorded. Your phone rings:

So Roberto, remember that app you wrote for us some years back? Was the server always at the same facility? Did it ever get moved? Can we prove the scheduling dates are accurate?.

Your answer: Nope.

Here's the problem with your supposition: you're basing it on an assumption and it's likely that you'll remain correct in your assumption, until you're not correct and then you're in trouble.

I've been burned by this 3 times in my life - the first was when I did this exact thing with lab samples (back when I was a geologist that was also a DBA). The sample results came to us in a CSV and I recorded the data in MS Access. I did this for a number of years. And then our clients got sued for contaminating the water under a neighboring property.

The first thing we needed to do was to establish When Where and How with respect to all of the readings we took. We had to gather every bit of information including rainfall records, tidal movements etc to show a reasonable gradient map (and subsequent flow) of contaminant that did not intersect the neighboring property (they also managed to contaminate the ground on on their own, but were trying to offset costs by suing our clients).

Our data was attacked mercilessly and they tried to show our recordings were inaccurate because the lab we used was local but they sent the samples cross country for verification - which meant the dates could possibly be confused in our readings and were therefore inadmissable because the data would be suspect. Seriously - this happened. We got out of this because we were able to confirm the labs readings with our samples - so it was OK.

In my sale to Pluralsight I had to reconcile some discrepancies in the data - specifically in December of 2011 when I released a new video and also dropped the price of annual subs. I was using MySQL and didn't think about using :date in my Rails migration - which didn't timestamp the data. The sale ran for 10 days, right over the New Year.

When Pluralsight ran the numbers, they were off from mine but $2200 or so (I sold a lot that month). They're in Mountain time, I was in HST (Hawaiian Standard), my servers were in New York. It took me quite a few days to sort that one and a call to my CPA so I could adjust my tax return that year (thankfully I got him just in time).

That's not a local concern I suppose - but when I setup Tekpub I didn't think much about using a simple date column; I thought it was obvious.

All the users of that system are local and always will be. Thus the date and time of camps are not recorded with timezone.

In this case I think you're reasoning is sound - but I'd like you to consider something as a fellow coder: you've built your solution based on an assumption. I maintain the date is only valid if your assumption holds - that no lawsuits will ever need it, no event in the life of this day camp will ever require that information for anything other than people in the local area.

You could be right, or you could be very wrong. You have a lot of experience - what do you tell your developers who don't test their code or take some kind of shortcut? Why would you ever choose to leave a hole in your app?.

To me, that's just plain lazy :).

rasantiago commented 8 years ago

@robconery All really good points and very realistic scenarios. I've had to work from the point of view of forensics and the point of view of prepping records for a client who was being sued.

When I engineer systems I always keep some standard fields like created_at and updated_at. Those never get seen by most users and are populated through triggers. They are there for auditing purposes. Those fields, of course, have full timestamps with tz info. Also, there are some change log tables I use for particularly sensitive data which also record tz. Again, this is not seen by the user but is there for a number of concerns. In addition, whenever I am contracted to architect or engineer a system I take special interest in asking about legal liabilities and of course make sure we have adequate tz coverage on any particularly sensitive data.

Still, when capturing fields like a birthdate or day of a kids camp or similar, I still stick with simple date. Interestingly, I did experience a legal situation where the choice of the simple DATE and simple TIME data types provided a stronger protection for a client. Specifically, the date and time as captured from the user could not be corrupted by shifts from differing time zones. Knowing the date and time as it was entered by the user became the critical question in a dispute a client of mine was having.

Anyway, we all do our best to engineer for the future but only get paid for answering the problems of the present, often to our chagrin.

mattjohnsonpint commented 8 years ago

Nice discussion here. The key to understanding this is that not all contexts are equal. The advice "always use UTC" or the equivalent we're discussing here of always having a time zone, is very context specific. It holds up for timestamping which is one of the most common things we do, but it doesn't apply to necessarily everything you might do with dates and times in computing.

For date-only values, the example of a birthdate is a good one. Mine is 1976-08-27. It's not 1976-08-27T00:00:00, nor is it 1976-08-27T00:00:00Z or 1976-08-27T00:00:00-07:00. The same thing applies to any kind of anniversary dates, such as work hire date. If you want to think about time/timezone context on a date like this, consider the arguments I made in this blog post.

For times without dates, consider business opening and closing hours. A business may be open from 8:00 AM to 6:00 PM, which is usually based on the local time zone of the business location. However, the offset from UTC may be different depending on whether or not daylight saving time is in effect, so storing a time+offset or a UTC time would be problematic.

For combined dates and times, consider that there's still value in timezoneless data, often referred to as a "floating time". There may be a time zone, but it comes from some other context which may vary. For example, NBC may have a schedule where tonight's evening news is aired across the US at 2016-04-27T18:00, but the exact instant depends on each station's local time zone. Another example of this is the alarm clock used by most mobile phones. Mine is scheduled to fire next at 2016-04-27T06:30. Even if I get on a plane and travel to another time zone, it will still go off at that time in whatever local time zone I'm in. There are many other cases, I'm just giving a few examples.

As it applies to this discussion, I have to agree with @rasantiago. With all due respect @robconery, the decision of what context should be in a date/time value is something the app developer should decide, on a case-by-case basis. The dev stack should enable all possibilities.

For a recent example of fail in this department, consider OData, which decided to drop support for DateTime in v4 due to similar arguments (no time zone). This has created quite a lot of pain for developers, as evidenced by OData/WebAPI#136 and http://stackoverflow.com/questions/25189557/how-to-get-web-api-odata-v4-to-use-datetime

robconery commented 8 years ago

Your birthday is an interesting example. Yes, your birthday is a timestamp (look at your birth record) but I will agree that recording someone's birthday is more about their age, not their time of birth - this doesn't refute my point.

Daylight savings is part of the time zone equation (PST vs PDT, for instance). I had to live with this for far too many years in Hawaii and yeah, business hours (which are intervals) have everything to do with time zones - I don't get your point on this.

There's nothing "timezoneless" about NBC's schedule and there's nothing varying about it - a broadcast happens at a point in time... again I don't get your point.

Even if I get on a plane and travel to another time zone, it will still go off at that time in whatever local time zone I'm in

Right. That's why you record dates with timezones.

the decision of what context should be in a date/time value is something the app developer should decide, on a case-by-case basis. The dev stack should enable all possibilities.

Nope. Never. Developers are horrible when it comes to understanding (and recording) data properly and the last damn thing I want is a developer pondering how to properly record a date and deciding that my business doesn't need to know the timezone. No thanks.

You go ahead and write your apps for your business the way you want - I've already learned the hard way.

robconery commented 8 years ago

This memory just hit me as I was on my way to bed. I was in Hawaii a few years back and my friend @samsaffron had a birthday - I knew it because Skype told me. I sent him a note saying "Happy Birthday" and he said "thanks mate - that was yesterday".

I woke up in Poland in September and freaked out - my iPad told me it was my anniversary and I had completely forgotten to call my wife! But then... it was the day before back in Seattle...

Great examples to dive into. If I was on the product team at Skype I think I'd very much want to know what timezone a person was in so I could relate it to birthday/anniversary notifications.

xivSolutions commented 8 years ago

@mj1856 - You make some interesting points, and they are (mostly) very applicable at an application/domain level. In my opinion, the original issue here, and most of your points, fall outside the scope of the database driver level, or the data access library level.

A. As Rob states, your "birthday" in data terms is indeed a moment in time, correctly represented as a datetime (Jimmy was born on 2/12/2016 at 2:08 AM US Central Time in Chicago for example). When most of us refer to our birthday as a simple date without a time component, this is a date time formatting issue, best left up to the application and/or developer. If someone really wants to store dates without time or timezone components, then datetime is the wrong data type in JS.

B. All of your points, and also Rob's, would be well-considered at the application level. With respect to Postgres' type system and the different flavors of date/time representations available, there is, ultimately, an impedance mis-match here. JS has very limited date/time support internally, and some annoying quirks with respect to how it handles interpretation of dates and times incoming from the backing store. It's not the driver's job, or Massive's job, to make assumptions about how the user want's to handle these quirks.

C. Apparently, Postgres makes available a type which represents a date, with a resolution of one day. I get that this is a convenient way to represent things like "my birthday' or "my anniversary", and would be, in fact, what we are often after when we retrieve a date from the database (time zone issues aside, per Rob's comment above).

However:

  1. JS does not have a corresponding type. So, what is the correct way to handle this? Do we assume that such dates should be returned as a JS Date? If so, what time zone, what offset? Just UTC? Do we force it to be 12:00 AM on 12/31/2016, or 11:59:59 PM on 12/31/2016? Or, maybe just a string? "12/31/2016" Seems like no matter which way we go, we would be introducing a behavior which would work for some cases out of the box, and be a source of endless annoyance for the rest. SO we just get out of the way.
  2. Given point 1, I submit that in the target language (JS), the notion of a "date" is represented in one way. Quirks or not, if you want something to be a "date" in JS, then you get to work with the idiosyncracies that brings. In JS, there is no notion of 12/31/2016 without a time component, If this is what you want, then you get to either format the date and return a string, or persist your date as a string, and live with the consequences. Or, deal with the time and timezone issues the target lang brings. Period.
  3. The minute you do new Date(myDate) in JS, JS itself will do it's annoying thing. If there is no timezone, it will "help" you figure that out by assuming UTC, and applying the local offset. Without introducing a custom type, or some non-standard JS behavior, this is what you get to deal with, working with JS.

As to the original issue at hand here - Massive undertakes to make it easier to query your data, providing some of the common query-type functions out of the box, and allowing you to add specific SQL queries as functions when needed.

Massive does not try to provide custom JS type implementations, nor any date/time library functions. It returns your data in the appropriate (approximate?) JS format. It is the job of the developer and the client application to deal with how dates/times are presented and manipulated.

That Postgres (the coolest relational Db in the world, in my opinion) offers up some types for which there is no direct counterpart in the target language, is simply one of those things to which we, as programmers, get to apply our skillz and problem-solve.

At the application level. Not at the Db driver lever, and not at the query tool level. If your Db thinks something is a "date" then that is what we're going to return.

As stated in another issue, though, we do plan to add a config object which will allow the client to specify parsing JS dates to UTC. But that will be an option, not standard behavior.

robconery commented 8 years ago

I hate to do this - but after reading this thread I'm sensing that it's a bit more about personal PR/branding than it is about Massive. The thread has gotten way off topic and I should have locked this off a while back as the OP is way too vague and theoretical.