JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
145 stars 25 forks source link

convenience functions to select or drop columns #74

Closed EvertSchippers closed 2 years ago

EvertSchippers commented 3 years ago

We find these very handy. They can be in our "Utils" package, or just in the latest TypedTables?

Did I miss better / existing ways to do something like basics = select(pointcloud, :position, :intensity) ?

andyferris commented 3 years ago

Hey Evert,

Yes there are undocumented macros @Compute and @Select for these kinds of manipulations. These macros return a function which you can call on a table or each row.

For example basics = @Select(position, intensity)(pointcloud).

This can be used with broadcasting and map and filter and so-on. I'll get back to you.

The fundamental issue here is these macros aren't really documented well.

andyferris commented 3 years ago

To elaborate with some examples

julia> t = Table(a = [1,2,3], b = [4,5,6])
Table with 2 columns and 3 rows:
     a  b
   ┌─────
 1 │ 1  4
 2 │ 2  5
 3 │ 3  6

julia> map(getproperty(:a), t)
3-element Vector{Int64}:
 1
 2
 3

julia> map(TypedTables.getproperties(:a, :b), t)
Table with 2 columns and 3 rows:
     a  b
   ┌─────
 1 │ 1  4
 2 │ 2  5
 3 │ 3  6

julia> map(@Compute($a + $b), t)
3-element Vector{Int64}:
 5
 7
 9

julia> map(@Select(a, b, c = $a + $b), t)
Table with 3 columns and 3 rows:
     a  b  c
   ┌────────
 1 │ 1  4  5
 2 │ 2  5  7
 3 │ 3  6  9
andyferris commented 3 years ago

(All the above should be pretty well optimized internally, for example the last one is intended to be the same as Table(a = t.a, b = t.b, c = map(+, t.a, t.b)).

EvertSchippers commented 3 years ago

Most excellent :) Funny the macro is also called "select", I should have found it just by julia> TypedTables.+tab... Just not looking for a macro I guess.

Thanks Andy!

Closing this PR obviously :)

andyferris commented 3 years ago

@Select:

image

Just not looking for a macro I guess

Yeah I always thought it was a slightly oddball way of solving the problem but it's a super convenient way of creating these mini-functions. I kinda wish there was a more compact lambda syntax for this, too.

EvertSchippers commented 3 years ago

So, one more question. Imagine I have a vector of fieldnames (read from some config file or as a result of some computation, so not known at compile time), can that @select still work? The macro requires hardcoded stuff, not an array of not-yet-known content...

I thought initially @select would solve our problem but I can't use it like that, I think... Did you look at my code? It's not hard to do it without macro, and I think it even compiles as it's all Tuple work. So if the same, non-hardcoded, "select" happens often, it may actually be as fast. You would need to splat your array of column names how I implemented it at the moment though.

Also, the "drop" is kind of convenient...

Thoughts? (@andyferris)

andyferris commented 3 years ago

True.

The unexported TypedTables.getproperties does some of this stuff. The macro just uses this.

julia> t = Table(a = [1,2,3], b = [4,5,6], c = [7,8,9])
Table with 3 columns and 3 rows:
     a  b  c
   ┌────────
 1 │ 1  4  7
 2 │ 2  5  8
 3 │ 3  6  9

julia> TypedTables.getproperties(:a, :c)(t)
Table with 2 columns and 3 rows:
     a  c
   ┌─────
 1 │ 1  7
 2 │ 2  8
 3 │ 3  9

Thinking about this, it could be cleaned up a little. We'd want getproperties(table, fields) where fields is a Tuple of Symbols. And we'd want getproperties(fields) instead of getproperties(fields...) to create the function, as above.

Dropping columns is currently achieved this way:

julia> Table(t, b = nothing)
Table with 2 columns and 3 rows:
     a  c
   ┌─────
 1 │ 1  7
 2 │ 2  8
 3 │ 3  9

As you say, it's a pain to use this form if you don't know the column names. We could define a deleteproperties(t, names) = getproperties(t, setdiff(propertynames(t), names))?

andyferris commented 3 years ago

@EvertSchippers see #77

andyferris commented 3 years ago

@EvertSchippers is the 1.3.0 release satisfactory? Or does this need another look?

EvertSchippers commented 2 years ago

image

I guess we're happy :) nice exercise though!