Open akrun1 opened 4 years ago
Tried comparing the read efficiency as well as select/subset between fsttable
and tidyft
. Both read the dataset (.fst
) (10328208 x 35) very efficiently, but it is the later steps that is costly in tidyft. If there are ways in fsttable to do this efficiently, it would be great.
Hi @akrun1, thanks for your feature request!
At the moment fsttable
does not have rbindlist
or cbind
functionality unfortunately as it is in it's first experimental stages (and not actively developed at the moment). But it would certainly be a requirement for a fully functional data.table
interface.
thanks, I'll add your issue as a feature request!
@MarcusKlik Thank you for the reply. I tried some of the packages (tidyft
, arrow
and disk.frame
). One of the main advantages with your package fsttable
is that it is so fast with slicing. With tidyft
, as soon as I use select_fst
and do some operations, it loses the advantage because it is pulling the data into memory. With disk.frame
, I split up the data into multiple csv file, but it still takes a lot of time to read the data and put that into .fst
files.
I would like to rbind two fsttable objects or a single fsttable with data.frame. What would be the preferred method?
For creating a new column/updating, I tried
If I update based on data.table methods, it is resulting in error
Is there a preferred method for modifying/updating columns? I did read some previous issues here and here. I just wonder if there are any updates for that. Thanks
PS: My objective is to update an already loaded fsttable object without converting to data.frame/data.table, add new rows and write it back as .fst file (after doing some join operations)