JuliaData / JSONTables.jl

JSON3.jl + Tables.jl
MIT License
68 stars 10 forks source link

Problem with missing/nothing and copying #4

Closed bkamins closed 5 years ago

bkamins commented 5 years ago

It is best shown on an example

julia> using DataFrames, JSONTables

julia> x = DataFrame(A=[true, false, true], B=[1, 2, missing],
                     C=[missing, "b", "c"], D=['a', missing, 'c'])
3×4 DataFrame
│ Row │ A     │ B       │ C       │ D       │
│     │ Bool  │ Int64⍰  │ String⍰ │ Char⍰   │
├─────┼───────┼─────────┼─────────┼─────────┤
│ 1   │ true  │ 1       │ missing │ 'a'     │
│ 2   │ false │ 2       │ b       │ missing │
│ 3   │ true  │ missing │ c       │ 'c'     │

julia> s1 = arraytable(x)
"[{\"A\":true,\"B\":1,\"C\":null,\"D\":\"a\"},{\"A\":false,\"B\":2,\"C\":\"b\",\"D\":null},{\"A\":true,\"B\":null,\"C\":\"c\",\"D\":\"c\"}]"

julia> s2 = objecttable(x)
"{\"A\":[true,false,true],\"B\":[1,2,null],\"C\":[null,\"b\",\"c\"],\"D\":[\"a\",null,\"c\"]}"

julia> j1 = jsontable(s1)
JSONTables.Table{false,JSON3.Array{JSON3.Object,Base.CodeUnits{UInt8,String},Array{UInt64,1}}}(JSON3.Object[{
   "A": true,
   "B": 1,
   "C": nothing,
   "D": "a"
}, {
   "A": false,
   "B": 2,
   "C": "b",
   "D": nothing
}, {
   "A": true,
   "B": nothing,
   "C": "c",
   "D": "c"
}])

julia> j2 = jsontable(s2)
JSONTables.Table{true,JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}}}({
   "A": [
          true,
          false,
          true
        ],
   "B": [
          1,
          2,
          nothing
        ],
   "C": [
          nothing,
          "b",
          "c"
        ],
   "D": [
          "a",
          nothing,
          "c"
        ]
})

julia> DataFrame(j1)
3×4 DataFrame
│ Row │ A     │ B      │ C      │ D      │
│     │ Bool  │ Union… │ Union… │ Union… │
├─────┼───────┼────────┼────────┼────────┤
│ 1   │ true  │ 1      │        │ a      │
│ 2   │ false │ 2      │ b      │        │
│ 3   │ true  │        │ c      │ c      │

julia> DataFrame(j2)
ERROR: MethodError: no method matching copy(::Nothing)

And there are two issues:

  1. missing gets converted to nothing in the write-read back process
  2. later you cannot materialize it with DataFrame constructor as internally a copy of the read-in vector is attempted and it fails as it contains nothing - but only in objecttable case

CC @quinnj