amp1919 / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Add support for Date only (ie w/o time) data type #533

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When gathering user feedback on our extension for Refine the other day, a W3C 
member mentioned that when formatting a date (e.g. 07/05/2011) - information is 
created which isn't correct.

A formatted date gains the "00:00:00Z" (e.g. 2011-05-07T00:00:00Z) - stating 
that whatever this date is - happened at midnight - which is actually untrue.

Original issue reported on code.google.com by danpaulsmith on 14 Feb 2012 at 3:03

GoogleCodeExporter commented 9 years ago
Could you elaborate on where that date came from? Did it start out as a string 
"2011-05-07" in a text file, or a Date cell in an Excel file?... And what 
function is used to format the date? Or is this only what's displayed in the 
data table?

Original comment by dfhu...@gmail.com on 20 Feb 2012 at 11:05

GoogleCodeExporter commented 9 years ago
It can come from anywhere - I'm not talking about the import stage. This 
happens after I have transformed a column's cells "To Date" (Edit Cells > 
Common Transforms > To Date) on the project page.

Original comment by danpaulsmith on 21 Feb 2012 at 9:32

GoogleCodeExporter commented 9 years ago
So this is about the level of granularity being finer than it is. I'm not sure 
what the fix would be. If I'm not mistaken, in most if not all programming 
languages, unless with special date/time modeling, whenever you convert a 
string to a date/time representation, the level of granularity is lost and you 
end up with millisecond "accuracy". So ... is this feature request asking for 
special date/time modeling that would retain the level of granularity?

Original comment by dfhu...@gmail.com on 21 Feb 2012 at 5:36

GoogleCodeExporter commented 9 years ago
I think it's just a case of using a different date format when *only* a date is 
present.

If only a date ("12/10/2010") is being transformed using toDate(), it should 
use the xsd:date format ("YYYY-MM-DD") and not the xsd:dateTime format 
("YYYY-MM-DDThh:mm:ss") which incorrectly adds "00:00:00Z".

Can there not be a test in the toDate() function to check whether a date is 
*only* a date and not a date-time?

Original comment by danpaulsmith on 22 Feb 2012 at 9:35

GoogleCodeExporter commented 9 years ago
I have a different need from Dan, to have a PREFERENCE setting to keep dateTime 
formatting versus date formatting.  Hmm, maybe a checkbox on the importer 
wizard could be added to handle this ?

Force dateTime formatting to "YYY-MM-DDThh:mm:ss" TRUE/FALSE

Original comment by thadguidry on 22 Feb 2012 at 2:22

GoogleCodeExporter commented 9 years ago
I second Dan's request... this has also been an issue for me/my data within 
Refine. 

Thanks,

Eric Jarvies

Original comment by ericjarv...@gmail.com on 26 Feb 2012 at 2:34

GoogleCodeExporter commented 9 years ago
The problem is that, although the UI always refers to it as a "date" it's 
really a date/time construct.  When a date string without a time is parsed, the 
time is set to midnight (it has to be set to something).

One possible solution would be to switch to Joda-Time 
http://joda-time.sourceforge.net/ which does have the concept of a date without 
a time.  It's relatively small (500KB) and has a compatible license.

Note that this won't solve 100% of the problem since some of our import formats 
(e.g. Excel) have a similar problem in that they always assume that there's a 
time associated with a date (as David mentioned, this isn't uncommon in 
software).  One thing we could potentially do with Excel is look at the format 
string for cell and use that to determine whether the value should be a Date or 
DateTime based on whether the time is being displayed or not.

@danpaulsmith - There's no reliable way to distinguish a date only from a 
date/time with a time of midnight after the fact.

Original comment by tfmorris on 26 Feb 2012 at 4:35

GoogleCodeExporter commented 9 years ago
To carry this thinking further, if the original text is "2010" and it gets 
parsed into date/time or just date or whatever, will we want it to be displayed 
as "2010" or "2010/01/01" thereafter?

Original comment by dfhu...@gmail.com on 26 Feb 2012 at 10:37

GoogleCodeExporter commented 9 years ago
"2010". Adding "01/01" raises the same issue as adding "00:00:00Z" to a date.

Original comment by danpaulsmith on 26 Feb 2012 at 10:48

GoogleCodeExporter commented 9 years ago
Another example I noticed with some recent data (XLS import), which might 
contribute to this:

Original values (Excel file):

Jun-10
Jun-11
Jul-10
Jul-11
Aug-10
Aug-11

Imported values:

2010-06-01T00:00:00Z
2011-06-01T00:00:00Z
2010-07-01T00:00:00Z
2011-07-01T00:00:00Z
2010-08-01T00:00:00Z
2011-08-01T00:00:00Z

The imported values suggest the events occurred on the 1st of each month.

Dan

Original comment by danpaulsmith on 28 Mar 2012 at 10:19

GoogleCodeExporter commented 9 years ago
Those are the display renderings of the original values, not the original 
values themselves.  It's no different than if you imported a cell with a value 
of 3.1111 which had a format string that caused it to display as "3.1".  If you 
import that into Refine, you'll get 3.1111, not 3.10.

The actual value stored by Excel is 2010-06-01T00:00:00Z (well, actually a 
floating point value which is equivalent to that).  If you ask Excel what the 
day of the month is for the date formatted as "Jun-10" it'll tell you it's the 
first.  If you ask for the year, it'll give you 2010, not 10.

By changing the format string, you can get this single value to display as 
6/1/10, 1/6/10, 1-Jun-2010, or a myriad of other formats.  None of this changes 
the underlying value itself and all calculations will treat the value as full 
precision, even if not all the precision is displayed.

I'm clearly not explaining this adequately, but the rendered value and the 
value itself are two different things and both Excel and Refine operate on the 
values, not the renderings (although Excel has more flexibility in changing 
formats/renderings than Refine does).

Original comment by tfmorris on 28 Mar 2012 at 5:49

GoogleCodeExporter commented 9 years ago
Sorry, I understand the problem you have now. Excel is creating false 
information for dates as a result of the way it stores them. Ok, I'll mull this 
one over on our side. Cheers for the explanation!

Original comment by danpaulsmith on 30 Mar 2012 at 9:00

GoogleCodeExporter commented 9 years ago
Does the date/time output enhance to your benefit if you instead, from Excel 
export to CSV first...and then import that CSV into Refine ?  Curious.

Original comment by thadguidry on 30 Mar 2012 at 2:38

GoogleCodeExporter commented 9 years ago
Yes, if the data is saved in CSV format and then imported into Refine, the 
information is correct - unfortunately we're not the ones importing data into 
Refine.

Original comment by danpaulsmith on 2 Apr 2012 at 1:38