ToucanToco / fastexcel

A Python wrapper around calamine
http://fastexcel.toucantoco.dev/
MIT License
85 stars 4 forks source link

ExcelReader parses "NULL" as string values, instead of empty/null values #181

Closed adrivn closed 4 months ago

adrivn commented 5 months ago

Hello, I'm following up on https://github.com/pola-rs/polars/issues/14495 as you requested. After a bit more digging, I figured out where part of the issue is.

Attached are three short example files, containing:

As it stands, Fastexcel (via polars) is not able to infer mixed types in the first and third examples/files, but can and does load the second file whilst inferring the data types correctly. This indicates that cells containing "NULL" are read as strings, instead of empty values.

I hope this helps. The string fallback conversion would be good, but given that "NULL" values are commonplace, especially in the context of massive CSV created from SQL dumps, I think addressing this first would fix a lot of loading issues.

Thank you for working on this wrapper!