JuliaData / IndexedTables.jl

Flexible tables with ordered indices
https://juliadb.org
MIT License
120 stars 38 forks source link

filter doesn't return correct columns when select is `Bool` #292

Open chiraganand opened 2 years ago

chiraganand commented 2 years ago

I created a table using t = table(1:10, randn(10), rand(Bool, 10); names = [:x, :y, :z]) and tried to return Bool columns.

The following call should have returned only the z column but it is returning all three. Is this how it should be? Because this is not what select does.

julia> filter(row -> row.z == false, t, select=Bool)
Table with 3 rows, 3 columns:
x  y          z
───────────────────
1  -1.02158   false
2  -1.14494   false
9  -0.662703  false

The select function returns only one Bool column

julia> select(t, Bool)
Table with 10 rows, 1 columns:
z
─────
false
false
true
true
true
true
true
true
false
true

Similarly, with Int:

julia> filter(row -> row.x == 1, t, select=Int)
Table with 1 rows, 3 columns:
x  y         z
──────────────────
1  -1.02158  false

The section of code responsible:

https://github.com/JuliaData/IndexedTables.jl/blob/2e97b488e24b9069cca5449b6e95dc9690ab19a1/src/selection.jl#L241-L245

I would propose the following patch so that the behaviour is consistent with select(). Not sure if this is the most efficient way.

 function Base.filter(fn, t::Dataset; select=valuenames(t))
-     x = rows(t, select)
-     indxs = findall(fn, x)
-     subtable(t, indxs, presorted=true)
 end

 function Base.filter(fn, t::Dataset; select=valuenames(t))
+     x = select(t, select)
+     indxs = findall(fn, rows(x))
+     subtable(x, indxs, presorted=true)
 end
ViralBShah commented 2 years ago

Submit a PR even if just making a suggestion...