JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
147 stars 25 forks source link

push! by Tuple, not NamedTuple, fails and creates a dummy row #84

Open jaemolihm opened 3 years ago

jaemolihm commented 3 years ago

As shown below, a failed push! (due to using Tuple or Array, not NamedTuple) adds a row with dummy data.

Also, would it be possible to support push! by Tuple or Array (based on the argument order), like in DataFrames.jl?

using TypedTables
tbl = Table(A=[1,], B=[1.,])
push!(tbl, (A=2, B=2.)) # success
push!(tbl, (3, 3.)) # fail
println(tbl) # 3th row is added

Output:

julia> using TypedTables

julia> tbl = Table(A=[1,], B=[1.,])
Table with 2 columns and 1 row:
     A  B
   ┌───────
 1 │ 1  1.0

julia> push!(tbl, (A=2, B=2.)) # success
Table with 2 columns and 2 rows:
     A  B
   ┌───────
 1 │ 1  1.0
 2 │ 2  2.0

julia> push!(tbl, (3, 3.)) # fail
ERROR: MethodError: Cannot `convert` an object of type Tuple{Int64, Float64} to an object of type NamedTuple{(:A, :B), Tuple{Int64, Float64}}
Closest candidates are:
  convert(::Type{NamedTuple{names, T}}, ::NamedTuple{names, T}) where {names, T<:Tuple} at namedtuple.jl:127
  convert(::Type{NamedTuple{names, T}}, ::NamedTuple{names, T} where T<:Tuple) where {names, T<:Tuple} at namedtuple.jl:130
  convert(::Type{T}, ::T) where T at essentials.jl:205
  ...
Stacktrace:
 [1] setindex!
   @ ~/.julia/packages/TypedTables/Dz4Kv/src/Table.jl:144 [inlined]
 [2] _append!
   @ ./array.jl:991 [inlined]
 [3] append!
   @ ./array.jl:981 [inlined]
 [4] push!(a::Table{NamedTuple{(:A, :B), Tuple{Int64, Float64}}, 1, NamedTuple{(:A, :B), Tuple{Vector{Int64}, Vector{Float64}}}}, iter::Tuple{Int64, Float64})
   @ Base ./array.jl:982
 [5] top-level scope
   @ REPL[4]:1

julia> println(tbl) # 3th row is added
Table with 2 columns and 3 rows:
     A               B
   ┌─────────────────────────────
 1 │ 1               1.0
 2 │ 2               2.0
 3 │ 47802621481152  2.36176e-310

Here's the versioninfo.

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
Environment:
  JULIA = /home/jmlim/appl/julia-1.6.1/bin/julia
  JULIA_EDITOR = "/home/jmlim/.vscode-server/bin/2d23c42a936db1c7b3b06f918cde29561cc47cd6/node"
  JULIA_NUM_THREADS = 1

(@v1.6) pkg> status TypedTables
      Status `~/.julia/environments/v1.6/Project.toml`
  [9d95f2ec] TypedTables v1.3.1
andyferris commented 3 years ago

This is an interesting request.

Currently Table is simply an AbstractArray. We can try to convert things to the correct NamedTuple but going beyond this would change the philosophy a little (part of the "philosophy" here is to see if Base types like arrays and named tuples are sufficient for doing relational algebra problems in a convenient and performant manner). Though I can totally undertand that something more flexible and user friendly could be useful...

Ideally we'd try to make operations like push! atomic so that if they fail then the vector is restored to the original state. I think we need to have more comprehensive checks that what was given was valid before the arrays are grown - not corrupting the table and printing a useful error message would be a lot better than the current behavior.

RossBoylan commented 1 year ago

Just ran into this with the latest version. It seems to me that the enhancement label is a bit misleading, since the behavior described in the title is a bug. Allowing pushes of non-Named Tuples is an enhancement that might cure the problem (or not, since dimensional mismatch would still be possible), but "simply" assuring that an error doesn't stuff junk into the table would fix the bug.

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
  Threads: 16 on 20 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

(MSEP) pkg> status TypedTables
Project MSEP v0.2.1-pre
Status `C:\Users\rdboylan\Documents\BP\MSEP\Project.toml`
  [9d95f2ec] TypedTables v1.4.3
andyferris commented 1 year ago

Yes, I suppose there are two fixes here.

One it to try make the updates "atomic" on typed tables, to fix the bug of the table being corrupted. I worry that in general this is basically impossible to fix, but we should make a best effort where practical.

The second is to actually support conversion from tuples.