Closed martinblostein closed 6 years ago
Hi @martinblostein, thanks a lot! Yes, that's the idea, the table_proxy keeps a complete picture of reads necessary to fulfill a request from the interface. So for example, when the data.table interface requests the first- and last 5 rows for printing, the table_proxy has to determine the actual number of rows that have to be read from file and in which order (not functional yet):
# reference to fst file and data.table interface, table_proxy and remote_table are created
ft <- fst_table("1.fst")
# interface requests update of proxy row- and column selection, new fst_table created
# the new fst_table contains a row-mask (the selection), 1 column reference and 1
# virtual column that holds the operator (>=) and the primitive (18) to be able to
# compute the contents of the whole column later.
ft2 <- ft[Year == 2016, .(Amount, Adult = Age >= 18)
# data.table interface requests first- and last 5 rows for printing, table_proxy determines that
# for this request, because `>=` works per-element, only the first- and last rows of Age
# are needed. so the printing command will be very fast (no significant data required)
print(ft)
Only when a method is used that does not work per-element (or fsttable
can't determine that), the whole column needs to be read (and stored in a separate file).
Thanks for the pull request!
Another small commit to restore some data.table functionality.
Disregarding recursive indexing, I believe this is the complete implementation of [[.datatableinterface, as the rows to return are determined by the table_proxy.
(Sorry for the pull request spam, I was in a hurry and kept pushing the wrong version.)