cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
126 stars 35 forks source link

Importing dates and timezone problems #283

Closed Daemach closed 2 years ago

Daemach commented 2 years ago

I was working through some import problems and noticed that dates on the first of the month were being imported as the last day of the previous month. My lucee app server is set to EST, the AWS server on which it is running is UTC, so off by -5 hours. When I switched the lucee timezone to UTC, the dates were imported properly. This threw the data off by quite a bit - not sure where the problem lies, but I suspect it's java date/time translation. Lucee stores date objects as datetime, which isn't good.

cfsimplicity commented 2 years ago

Are you definitely running the latest version?

There was a similar issue recently caused by the Lucee timezone being different from the OS and a fix in version 3.1.0 to address it.

If you are on the latest then can you provide more detail on how you're doing the import?

This simple test seems to work OK even with a deliberate TZ change/mismatch.

//stash TZ
stashedTZ = GetTimeZone() //My system is GMT/UTC
SetTimeZone( "US/Eastern" )
result.tempTZ = GetTimeZone()

data = QueryNew( "Date", "VarChar", [ [  "2022-01-01" ] ] )
datatypes = { date: [ "Date" ] }

spreadsheet = New Spreadsheet()

wb = spreadsheet.newChainable( "xlsx" ).addRows( data:data, datatypes:datatypes )
result.value = wb.getCellValue( 1, 1 )
result.excelInternalValue = wb.formatCell( { dataformat: "0.0" }, 1, 1 ).getCellValue( 1, 1 )

dump( result )
/* 
tempTZ: US/Eastern
value: 2022-01-01
excelInternalValue: 44562.0
*/
//reset
SetTimeZone( stashedTZ )
Daemach commented 2 years ago

Yes, I'm on 3.3.0 everywhere.

In my case, the system is GMT, the lucee instance is EST (-5) and I'm importing data from an xlsx file. These dates, when inserted into the database, end up as 10/31 instead of 11/01: https://www.screencast.com/t/6HAoklvq I've seen Lucee insert dates with timezone info (-5 in this case) so I tried inserting everything as ex: '2019-11-01 12:00', which, if the date had imported correctly should have inserted 12PM - 5, or 7AM, but still on 11/01. It did not work. Only changing Lucee to UTC temporarily did. This tells me that the conversion happened on import rather than export.

In the end, this is a Lucee issue - they should not be using datetime objects for dates. Something similar happened here: https://luceeserver.atlassian.net/browse/LDEV-3751.

cfsimplicity commented 2 years ago

Sorry I should have realised "import" meant you were using read().

I can reproduce the issue with format="query" which as you say points to the way Lucee handles date objects. Just reading the file into a workbook object and using getCellValue() seems to give the correct date regardless of TZ setting.

Could you try adding queryColumnTypes="auto" to your read() call? i.e.

importedData = spreadsheet.read( src=yourfilepath, format="query", queryColumnTypes="auto" );

Warning: it might be slower, depending on how big the file is.

Daemach commented 2 years ago

I could, but I don't use it like that. I do query.each( (row)=>{} ) then row['columname'] Does that work the same way? I used a workaround to get past this problem last night and it would take a lot of work to undo it in order to test.

cfsimplicity commented 2 years ago

Actually forget queryColumnTypes, it was a red herring, sorry. Thought I saw a result but then couldn't replicate it.

I think I've located the problem which is that the read() query handling extracts date values from cells in a different way to getCellValue(). They should do it the same way really but I'm a bit apprehensive in case it proves a breaking change. The test suite does break but that might just be the way I've set up the tests.

I'll need to give it some thought and get back to you.

Daemach commented 2 years ago

Sounds great. Thank you.

cfsimplicity commented 2 years ago

When I've looked at it, would you be help with testing/checking I'm not doing something silly?

Daemach commented 2 years ago

I'll find a way. It would be good to get this fixed.

cfsimplicity commented 2 years ago

Rather than risk breaking things, I think I've found a simple and safe way of addressing this by applying the same fix as the previous issue I mentioned.

So when evaluating a date value, the library will now check for a Lucee/POI timezone mismatch (POI uses the system TZ), and temporarily making POI match Lucee - not the other way round as you've been doing.

Can you give the develop branch a try and report back?

cfsimplicity commented 2 years ago

@Daemach Did this fix work for you?

Daemach commented 2 years ago

I haven't had a chance to test it, and I don't think I'm going to be able to get to it soon :/ Deep in the middle of sorting out some bad accounting system data. I would say roll it, and if it comes up again we can deal with it then.

cfsimplicity commented 2 years ago

That's fine, I was just curious. I'll leave it on develop and won't cut another release specifically for it unless I hear from you.