JuliaData / Tables.jl

An interface for tables in Julia
https://juliadata.github.io/Tables.jl/stable/
MIT License
302 stars 52 forks source link

Better support for in-place operations on tables #116

Open rofinn opened 5 years ago

rofinn commented 5 years ago

Specifically, it'd be nice if I could use some traits to determine whether I can mutate the underlying data during row or column iteration (e.g., mutating values in DataFrameRow).

quinnj commented 5 years ago

Ok, I've been noodling on this for.......6 days (haha, actually longer, because people have brought it up on slack and stuff). @rofinn can you talk a little more about the use-case you have in mind for this? I have some ideas, but most of mine end in "oh, this actually wouldn't be useful for the most part", but I want to hear a solid case where someone wants to use it and how it would be helpful. Anyway, I can try to put some of my thoughts together, but in the mean time, I thought I'd ask for some more info from your side.

rofinn commented 5 years ago

My use case is in Impute.jl where I'm trying to mutate data in-place if possible by applying some operation over each column.

function impute!(table, imp::Imputor)
    istable(table) || throw(MethodError(impute!, (table, imp)))

    # Extract a columns iterator that we should be able to use to mutate the data.
    # NOTE: Mutation is not guaranteed for all table types, but it avoids copying the data
    columntable = Tables.columns(table)

    for cname in propertynames(columntable)
        impute!(getproperty(columntable, cname), imp)
    end

    return table
end

https://github.com/invenia/Impute.jl/blob/master/src/imputors.jl#L155

In this code, the passed in table will only sometimes mutate the data depending on table type passed in. It'd be nice if I could check that calling Tables.columns will allow me to mutate the underlying data and throw a warning if it doesn't.

Drvi commented 5 years ago

Hi,

for my usecase -- trying to get Selections.jl easily available to the ecosystem -- I'd like to be have select() and select!() functions, both could de-select columns and for mutable datasources, I'd like to provide the inplace variant for efficiency. This would require some way of signaling mutability (Tables.ismutable?) and providing a way of deletion of columns (Tables.deleteat!) as well as their reordering (like permutecols!). What @rofinn describes also seems useful to me.

quinnj commented 4 years ago

With https://github.com/JuliaData/Tables.jl/pull/131, we're committing to enhancing the Tables.jl interface a bit, but also trying to keep it very minimal, to encourage adoption. As I've thought of this and a few other related issues, I think it would make sense to have a MutableTables.jl package (or maybe called InMemoryTables.jl). It turns out there are a lot of things like this that people want to do, but that really apply to a stricter subset of "table types" that allow mutation and can be manipulated (or indexed, or sorted, etc.). So in my mind, it's possible we could define something in Tables.jl, but it feels a bit off because Tables.jl is trying to be so generic (though admittedly not as generic as TableTraits.jl). That's why I think it'd be useful to have a separate package that could use Tables.jl, but also define additional interface requirements for various table manipulations. Thoughts @Drvi , @bkamins , @nalimilan , @davidanthoff , @rofinn , @iamed2 , @andyferris ?

bkamins commented 4 years ago

I think that there are three levels of this mutability, and we should be explicit at which level we target:

  1. allowing to change some values in the table without resizing it (setindex!, sort!, ...)
  2. allowing to change number of rows (but keeping schema fixed)
  3. allowing to change number of columns, names of columns, eltype of columns
tpapp commented 4 years ago

I think a separate package, which enhances the interface, would be the best approach for now, since it would allow experimentation with the mutable interface without affecting the API defined in this package (cf #133).

andyferris commented 4 years ago

@bkamins I would be tempted to try make these three seperate/orthogonal interfaces for perming different mutations, rather than “levels” or layers with some on top of the others.

E.g. I’m imagining you could have 3 without 2 (data frame of static arrays) or 2 without 1 (functional programmers like to think of “append only” databases).

bkamins commented 4 years ago

Sure - they are largely orthogonal. I have this order in the back of my head, as it is natural in DataFrames.jl, but for other data structures clearly it is the way you say 😄.

A particular cases is that 3 assumes allowing "replacing" of the column it is not the same as 1, which mostly assumes updating column in-place (however, for some data structures 1 would imply replacement - when in order to setindex! you would have to replace a column because it is immutable, but 1 would guarantee that eltype after replacement does not change).

andyferris commented 4 years ago

Yes it’s very interesting how mutating a column behaves somewhat the same as mutating the rows. Of course, you can tell the difference when you have access to the column references.

The way I always imagined this playing out is (a) have two APIs/traits for mutation and insertion into data structures (and “upsert” for data structures that support both, this is the way it is done in Dictionaries.jl), and (b) have table modelled as a nested data structure (a relation is a collection of rows). All the different cases you mention simply fall out naturally.

rofinn commented 4 years ago

On a related note, should Tables.jl have a similar fallback like the arrays interface where folks can be guaranteed to be returned a mutable table? That would simplify the code posted above at the cost of potentially inconsistent return types.

juliohm commented 1 year ago

I wonder if there was any progress regarding experimentation of a trait system for mutable tables? At this point in time Tables.jl is the defacto standard for tables in Julia, and we are reaching applications where mutability and a basic setindex! would be great.

juliohm commented 2 months ago

Any progress here? Or any draft somewhere?