fstpackage / fsttable

An interface to fast on-disk data tables stored with the fst format
GNU Affero General Public License v3.0
27 stars 4 forks source link

The fsttable object needs to contain a self reference #8

Closed MarcusKlik closed 6 years ago

MarcusKlik commented 6 years ago

To be able to process code like:

# identify fsttable with fst file
ft <- fsttable::fst_table("1.fst")

# add simulated column 'N'
ft[, N := E * 5 + B]

In this example, column N is added as a simulated (or perhaps better: virtual?) column to the table. That means that no data is generated yet, but the new column is kept as a tree structure of known methods (* and +) and data (5).

To store that information, ft needs to be updated. To do that, a fsttable needs to update itself which requires an internal self reference (like a data.table object). Perhaps the relevant data in a fsttable can be encapsulated in a single cell list-type data.table to start with (that list element can be updated in-memory). Equivalent code:

# some object
obj <- list(Param1 = TRUE, Param2 = 1)

# store in a single cell data.table
x <- data.table::data.table(Data = list(obj))

# that creates a data.table column of type 'list'
typeof(x$Data)
#> [1] "list"

# example method for updating list element
update_obj <- function(x) {
  obj_current <- x[1, Data[[1]]]  # get obj
  obj_current[["Param3"]] <-  "value"  # update obj
  x[1, Data := list(list(obj_current))]  # rewrite to x
}

# update in-place
update_obj(x)

# the element in x now points to the updated obj
x$Data[[1]]
#>    Param1 Param2 Param3
#> 1:   TRUE      1  value
MarcusKlik commented 6 years ago

The remote proxy object is now kept inside a data.table cell, as is a remote proxy state object. Both can be updated in-memory.

Closing for now, when we want to update the auto-completion of columns after an in-place modification (with :=), a self-reference is still needed. But for a generated new datatableproxy object, that's not necessary.