JuliaData / CSV.jl

Utility library for working with CSV and other delimited files in the Julia programming language
https://csv.juliadata.org/
Other
459 stars 141 forks source link

"Missing" Values #1107

Open m-knopp opened 10 months ago

m-knopp commented 10 months ago

Hi,

I have a hard time understanding why empty Strings are interpreted as missing by default. missing represents a value that exists, but we don't have access to. Why would we assume that with no semantic information about the data we are parsing? "" is not a missing value, it is just an empty String and should be treated as such.

This is really awkward when you try further operations that fail because the type of a column is now Union{Missing, String}. Also I found that CSV.read("myfancytable.csv", DataFrame, missingstring="") does not replace the "missing" values with empty Strings, they are still missing values. CSV.read("myfancytable.csv", DataFrame, missingstring="abc") does not replace "missing" values with String("abc"), but with nothing values.

My suggestion is to use String("") or nothing as the default value for an empty table cell.

hhaensel commented 9 months ago

You have two options:

hhaensel commented 9 months ago

I agree that it's a bit strange that

CSV.read("test.csv", DataFrame, missingstring = String[])

doesn't provide the same result as

CSV.read("test.csv", DataFrame, missingstring = nothing)