JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
145 stars 25 forks source link

Prevent #undef values in Tables due to attempted column mutation #82

Closed adigitoleo closed 3 years ago

adigitoleo commented 3 years ago

Julia v1.6.1, TypedTables v1.3.0

While discovering how to add data to a Table, I found that it is possible to perform push! operations that unexpectedly create #undef values in table cells, causing corruption of the table. In other words, Tables are not immutable enough :)

Hard to explain, let me show some examples. I've omitted full stack traces for clarity:

using TypedTables
tb = Table(col=[])
julia> push!(tb, (; not_a_col=4))
ERROR: MethodError: Cannot `convert` an object of type
  NamedTuple{(:not_a_col,),Tuple{Int64}} to an object of type
  NamedTuple{(:col,),Tuple{Any}}

OK, so I messed up. Show me the Table:

julia> tb
Table with 1 column and 1 rowError showing value of type Table{NamedTuple{(:col,), Tuple{Any}}, 1, NamedTuple{(:col,), Tuple{Vector{Any}}}}:
ERROR: UndefRefError: access to undefined reference

What is going on here?

julia> tb.col
1-element Vector{Any}:
 #undef

Perhaps this should instead result in an ArgumentError like for append!:

julia> tb = Table(col=[])
julia> append!(tb, Table(not_a_col=[4]))
ERROR: ArgumentError: Named tuple names do not match.

I'm not sure to what extent this affects other operations, but I'll report it here if I find something.

andyferris commented 3 years ago

Ah - good catch! Yeah we should definitely throw an error before the arrays are grown. Thanks

adigitoleo commented 3 years ago

I found one more (fairly contrived) example when (ab)using the splat operator:

julia> cols = (:this, :that, :and, :other, :things)
(:this, :that, :and, :other, :things)

julia> tb = Table(; zip(cols, fill(Int[], length(cols)))...)
Table with 5 columns and 0 rows:
     this  that  and  other  things
   ┌───────────────────────────────

julia> push!(tb, (; zip(cols, fill(1, length(cols))...)...))
ERROR: MethodError: Cannot `convert` an object of type
  NamedTuple{(:this,),Tuple{Int64{}}} to an object of type
  NamedTuple{(:this, :that, :and, :other, :things),Tuple{Int64{},Int64,Int64,Int64,Int64}}
  ...

julia> tb
Table with 5 columns and 1 row:
     this  that  and  other  things
   ┌───────────────────────────────
 1 │ 94    94    94   94     94

The only reason I tried that incantation was because I'm still a bit confused about how to add dynamically created rows:

julia> cols = (:this, :that, :and, :other, :things)
(:this, :that, :and, :other, :things)

julia> tb = Table(; zip(cols, fill(Int[], length(cols)))...)
Table with 5 columns and 0 rows:
     this  that  and  other  things
   ┌───────────────────────────────

julia> new_row = (; zip(cols, fill(1, length(cols)))...)
(this = 1, that = 1, and = 1, other = 1, things = 1)

julia> push!(tb, new_row)
Table with 5 columns and 5 rows:
     this  that  and  other  things
   ┌───────────────────────────────
 1 │ 1     1     1    1      1
 2 │ 1     1     1    1      1
 3 │ 1     1     1    1      1
 4 │ 1     1     1    1      1
 5 │ 1     1     1    1      1

To me, it looks like I got 5 rows for the price of 1. Am I missing something?

adigitoleo commented 3 years ago

Looks like the issue in that last snippet was with the table creation not with push, since it works fine for this...

julia> tb = Table(; this=[0], that=[0])
Table with 2 columns and 1 row:
     this  that
   ┌───────────
 1 │ 0     0

julia> cols = (:this, :that)
(:this, :that)

julia> new_row = (; zip(cols, fill(1, length(cols)))...)
(this = 1, that = 1)

julia> push!(tb, new_row)
Table with 2 columns and 2 rows:
     this  that
   ┌───────────
 1 │ 0     0
 2 │ 1     1

...but not for this:

julia> tb = Table(; zip(cols, fill([0], length(cols)))...)
Table with 2 columns and 1 row:
     this  that
   ┌───────────
 1 │ 0     0

julia> new_row = (; zip(cols, fill(1, length(cols)))...)
(this = 1, that = 1)

julia> push!(tb, new_row)
Table with 2 columns and 3 rows:
     this  that
   ┌───────────
 1 │ 0     0
 2 │ 1     1
 3 │ 1     1
andyferris commented 3 years ago

To me, it looks like I got 5 rows for the price of 1. Am I missing something?

The problem is with fill. Every column is the exact same Int[]. You can confirm this with tb.this === tb.that, for example. You can use a map or generator function for this:

julia> cols = (:this, :that)
(:this, :that)

julia> data = (; (col => Int[] for col in cols)...)
(this = Int64[], that = Int64[])

julia> data.this === data.that
false

julia> t = Table(data)
Table with 2 columns and 0 rows:
     this  that
   ┌───────────

julia> push!(t, (; (col => 1 for col in cols)...))
Table with 2 columns and 1 row:
     this  that
   ┌───────────
 1 │ 1     1
andyferris commented 3 years ago

@adigitoleo I have fixed

We are releasing version 1.3.1 shortly, so you should be able to ]up soon. Thanks again for the bug report :)