dhoerl / DHlibxls

Framework to read Excel xls spreadsheets
271 stars 86 forks source link

Not able to read date in format dd/mm/yy #15

Closed msaad99 closed 9 years ago

msaad99 commented 10 years ago

I have an excel file which contains dates in this format dd/mm/yy in its cells. Im trying to read and store them as strings but when I read the cell and output its contents it shows a weird number. For example: 18/02/14 is producing an output of 41682.0000 and in the very next line the same date 18/02/14 is being shown as 41688.0000

Everything else is being read properly. Any idea what's causing this?

dhoerl commented 10 years ago

On 7/9/14, 1:04 PM, msaad99 wrote:

I have an excel file which contains dates in this format dd/mm/yy in its cells. Im trying to read and store them as strings but when I read the cell and output its contents it shows a weird number. For example: 18/02/14 is producing an output of 41682.0000 and in the very next line the same date 18/02/14 is being shown as 41688.0000

Everything else is being read properly. Any idea what's causing this?

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/15.

Excel does not support "dates" in a native format - they are stored as integers with a format with a format of "date in format of dd/mm/yy". I suspect that the current libxls library does not propagate the format, and even if it did not sure how easy it would be fix it.

In any case can you upload a sample file, or post the URL to one in a public place like dropbox, and I'll take a look.

msaad99 commented 10 years ago

This is the file that I'm using: https://www.dropbox.com/s/dj6m3n6zq6v9gvs/daud.xls

msaad99 commented 10 years ago

So any idea on how to fix this?

dhoerl commented 10 years ago

On 7/10/14, 9:26 AM, msaad99 wrote:

So any idea how to fix this?

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/15#issuecomment-48603590.

Excel saves dates as a double. The only way to know that a given cell is in a date format is to examine the format record, and see if its one of the date formats (see OpenOffice Excel file format, 5.49 FORMAT). This involves looking at each cell's XF record, getting the FORMAT reference, then seeing if its a date (or time) or not.

This is obviously a lot of effort and not something I plan on doing any time soon.

Knowing your self that some cells are dates, you can convert the date into a string in the format you wish using the Foundation date formatting utilities. Note that the date in Excel is the number of days since 1900: http://www.clear-lines.com/blog/post/Converting-Excel-date-format-to-SystemDateTime.aspx

dhoerl commented 10 years ago

On 7/10/14, 9:26 AM, msaad99 wrote:

So any idea how to fix this?

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/15#issuecomment-48603590.

Looking further, the format you applied to this cell is number 164, the first format after Excel's default formats (that is, a person created the format, then applied it to that cell):

libxls : BOF ID: 041Eh FORMAT (Number Format) Size: 13 libxls : xls_addFormat index : 164 value: m/d/yy;@

The only way to know this is a date format is to look at the string, and use some heuristic to see that what its doing is creating a date. Its not like there is a bit that says "this is a date".

Ditto for time and other special formats like currency.

Unfortunately, this is what Excel does, and trying to re-create its formatting is a major undertaking.

msaad99 commented 10 years ago

Okay so then in what way should the person making the excel file add a date so that it can be read properly?

dhoerl commented 10 years ago

On 7/10/14, 11:05 AM, msaad99 wrote:

Okay so then in what way should the person making the excel file add a date so that it can be read properly?

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/15#issuecomment-48617007.

If you can enter the date as a string - and not have Excel reformat it into an internal number representation - that should work. For instance, if you try something like "July 4 2014", or "7 - 4 - 14", or "8-4-14" - I imagine that would work.

Unfortunately I don't have a copy of Excel at this time to even play with (I've got Numbers on my Mac, that's it).

The strategy is to get Excel to accept the input as a string - you will then see that string in the output.