Open rad-pat opened 2 weeks ago
@rad-pat thanks for feedback, it is a known problem.
rust-csv
, which databend depends on, sticks to the spec https://github.com/BurntSushi/rust-csv/issues/114spark do not distinguish them after 2.0.1, https://mrpowers.medium.com/sparks-treatment-of-empty-strings-and-null-values-in-csv-files-80748893451f (it is interesting that the article state that spark <=2.0.0 read blank/missing as empty string and the quoted one as null, reverse to what we expected)
a workarounds is to dump the csv with special strings for null instead of "".
we may consider to support distinguish them too, recently python 13 seems to support it with some options, https://github.com/python/cpython/issues/113732
BTW, why do you have to distinguish null and empty string in your application?
We receive CSVs from upstream customer systems that we have no control over. For the most part NULL is the correct default. However, there are real world situations where a blank string has a different meaning than null.
We are migrating over from Postgres/Greenplum where, when importing from CSV file, we are able to specify that a blank entry represent null, but a blank quoted string is an empty string. I cannot replicate this behaviour with Databend, but would very much like to. I have tried many combinations of the
EMPTY_FIELD_AS
andNULL_DISPLAY
but I cannot get the same as the Postgres import. The ability to differentiate between blank entry and empty string would be very useful to us.CSV Data:
Postgres Import
Result
Databend Import
Result