fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
618 stars 42 forks source link

Convert rds to fst #145

Open mshamer opened 6 years ago

mshamer commented 6 years ago

Hello,

I have a large number of rds files (each file>100MB) that I'm trying to work with. I just got familiar with your fst package which looks promising for my needs, since I need to read a certain column each time and not necessarily the whole file. Since this is not something I can do with the rds format, I wondered if there is a way to convert the files from rds format to fst format.

Looking forward to your response

Thank you!

shrektan commented 6 years ago

fst is only able to write data.frame like data. If your rds files are all data.frame, you can simply read them into your computer's memory and write to fst files. Otherwise, I don't think it's possible. Some illustration codes:


for (file in files) {
    tmp <- readRDS(file)
    fst::write_fst(tmp, fst_file)
}
MarcusKlik commented 6 years ago

Hi @mshamer, thanks for submitting your issue!

@shrektan is quite right, at the moment, it's only possible to read your rds file as a whole before writing it to fst format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds format to the fst format column by column.

The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds format more closely to see if it's straightforward to read the table one column at a time.

At first glance (see for example here) the rds format is a recursive format, so reading one R object at a time should be possible through the R api.

Thanks for submitting your feature request!

mshamer commented 6 years ago

Thank you very much!

I appreciate your prompt responses

Best,

M

Meytar Sorek-Hamer, PhD NPP Research Fellow (USRA) NASA Ames Research Center Building 245, Room: 280L Moffett Field, CA 94035 USA ph: 650-604-0153 cell: 669-264-8000

On Mon, Apr 16, 2018 at 4:29 AM, Mark Klik notifications@github.com wrote:

Hi @mshamer https://github.com/mshamer, thanks for submitting your issue!

@shrektan https://github.com/shrektan is quite right, at the moment, it's only possible to read your rds file as a whole before writing it to fst format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds format to the fst format column by column.

The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds format more closely to see if it's straightforward to read the table one column at a time.

At first glance (see for example here https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Serialization-Formats) the rds format is a recursive format, so reading one R object at a time should be possible through the R api.

Thanks for submitting your feature request!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fstpackage/fst/issues/145#issuecomment-381567807, or mute the thread https://github.com/notifications/unsubscribe-auth/AkowO3p-sizPOtg4uFF52-tuyLTOKhY-ks5tpIB-gaJpZM4TV43E .