Open severinh opened 4 months ago
Yup the issue is that it's detected as datetime
by calamine. We can try to do a as_date()
afterwards but it would be better if calamine could handle directly dates.
I'll dig into into it this weekend but it won't be a quick fix like the bool to string. I'll probably need to open a PR on calamine side
Thanks @PrettyWood! I suspected that this one would not be as straight-forward.
In any case, no pressure on this. In contrast to #250, this issue here is not blocking our adoption of fastexcel. We can work around this for now by trimming 00:00:00
as a post-processing step. Something like:
sheet_df.with_columns(pl.all().str.replace_all(r"^([0-9]{4}-[0-9]{2}-[0-9]{2}) 00:00:00$", "$1"))
How to reproduce
Suppose you have an Excel file with the following mixed-type data:
=DATE(2024, 7, 1)
=DATE(2024, 7, 2)
"some string"
Now lets read this Excel file into a Polars dataframe, which coerces the column to strings:
excel_reader.load_sheet(0).to_polars()
This produces the following data frame:
"2024-07-01 00:00:00"
"2024-07-02 00:00:00"
"some string"
Expected behavior
What I would have expected fastexcel to not include the time
00:00:00
for these dates. That is,"2024-07-01"
"2024-07-02"
"some string"
Test case
Excel sheet: sheet-date.xlsx