6 columns with 10 records each and a header, each column formatted as the data type is supposed to represent (integer, date, timestamp, float), but including "NULL" as values, representing empty values.
Same as the prior one, but introducing two additional columns with mixed data types (integer plus string, date plus string, no "NULL" string values this time)
As it stands, Fastexcel (via polars) is not able to infer mixed types in the first and third examples/files, but can and does load the second file whilst inferring the data types correctly. This indicates that cells containing "NULL" are read as strings, instead of empty values.
I hope this helps. The string fallback conversion would be good, but given that "NULL" values are commonplace, especially in the context of massive CSV created from SQL dumps, I think addressing this first would fix a lot of loading issues.
Hello, I'm following up on https://github.com/pola-rs/polars/issues/14495 as you requested. After a bit more digging, I figured out where part of the issue is.
Attached are three short example files, containing:
As it stands, Fastexcel (via polars) is not able to infer mixed types in the first and third examples/files, but can and does load the second file whilst inferring the data types correctly. This indicates that cells containing "NULL" are read as strings, instead of empty values.
I hope this helps. The string fallback conversion would be good, but given that "NULL" values are commonplace, especially in the context of massive CSV created from SQL dumps, I think addressing this first would fix a lot of loading issues.
Thank you for working on this wrapper!