JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

Update subdatatable.jl #46

Closed Evizero closed 7 years ago

Evizero commented 7 years ago

Fixes issue when trying to view(::SubDataTable, ::Int). I think so at least. I edited this patch on github

julia> dt = DataTable(x1 = rand(4), x2 = rand(4))
4×2 DataTables.DataTable
│ Row │ x1       │ x2       │
├─────┼──────────┼──────────┤
│ 1   │ 0.163479 │ 0.469325 │
│ 2   │ 0.821491 │ 0.939964 │
│ 3   │ 0.121741 │ 0.985533 │
│ 4   │ 0.613799 │ 0.194209 │

julia> v = view(dt, 1) #works
1×2 DataTables.SubDataTable{Array{Int64,1}}
│ Row │ x1       │ x2       │
├─────┼──────────┼──────────┤
│ 1   │ 0.163479 │ 0.469325 │

julia> v = view(dt, 1:2)
2×2 DataTables.SubDataTable{UnitRange{Int64}}
│ Row │ x1       │ x2       │
├─────┼──────────┼──────────┤
│ 1   │ 0.163479 │ 0.469325 │
│ 2   │ 0.821491 │ 0.939964 │

julia> view(v, 1)
ERROR: MethodError: no method matching DataTables.SubDataTable{T<:AbstractArray{Int64,1}}(::DataTables.SubDataTable{UnitRange{Int64}}, ::Array{Int64,1})
ararslan commented 7 years ago

Great, thanks! Could you add a test?

cjprybol commented 7 years ago

Thanks @Evizero! Your fix may be correct, but looking through the code for view I think the whole thing might need an overhaul. Cross referencing https://github.com/JuliaData/DataTables.jl/issues/38#issuecomment-288366830, there could be several functions in DataTables where returning a copy-free view could be used in-place of copying the datatable. We should check that everything about view works as expected, this example of view(::SubDataTable, ::Int) included.

A few issues that jumped out at me:

  1. sometimes SubDataTable calls view here, while other times view calls SubDataTable here, here and here, while other views call view again here, here, here, and here. It might be useful to have everything call in the the same direction i.e. view -> SubDataTable functions to normalize inputs -> SubDataTable constructor.
  2. view on columns will only work if the column is a scalar/single value like a symbol or Int here, but probably should work on Vector{Bool|Int|Symbol} too.
  3. view is using a custom Index that is only used by the view calls here, here, and here. We should probably get rid of that and the associated SimpleIndex here
  4. view doesn't have a full set of tests. There are 14 lines in the tests that I could find that use view in some way, but there are several combinations of arguments that are missed.

I gave these ideas a shot w/ new SubDataTable/view code here and tests here. If any of those changes are desirable we can merge the useful bits into this PR.

nalimilan commented 7 years ago

@cjprybol These further improvements make sense, but we should probably merge the simple change from this PR (with a test) since it fixes a bug, and make the larger cleanup later.

Evizero commented 7 years ago

Added a test that would fail on master