Closed cschloer closed 4 years ago
Example file: cast_string_none.xlsx
Added tests
Thanks @cschloer
After #115, I wonder if this would still be a problem (now released in dataflows@0.0.64
)
This PR changes the behavior so that infer_strings
also implies force_strings=True
.
In tabulator
, this causes None
values to be interpreted as the empty string:
>>> import tabulator
>>> s=tabulator.Stream('https://github.com/datahq/dataflows/files/3851313/cast_string_none.xlsx', force_strings=True).open()
>>> list(iter(s))
[['Species', 'Age (days post hatch)', 'Size (mm total length or standard length)', 'Individual'],
['E. lori', '0', '3', '']]
So dataflows
will never get these None
s.
On a side note, in these cases you could simple use 'cast=nothing' to keep the types as-is with a strings-only schema.
This is indeed fixed in the most reason version :) Thanks @akariv @roll
Currently when cast_strategy and infer_strategy are set to string in the load flow it also converts NoneTypes to the string None. This isn't a problem for csv's where empty values are communicated through the empty string, but for excel files with empty cells the openpyxl parser returns the value
None
. I added a lineand v is not None
to the stringer function in load so that values of type None remain so.