fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
614 stars 42 forks source link

How to extract contents from a fst file when R crashes reading it #264

Open gabowi opened 2 years ago

gabowi commented 2 years ago

Hey everyone,

first of all thank you for this package, which is quite helpful in our work. For the first time after writing and reading a lot files already, I now experience a problem.

Trying to read a 12 GB fst file (using: read_fst(path_fstfile)), R crashes. The error message is: "R Session Aborted. R encountered a fatal error. The session was terminated."

This can be reproduces on different computers and from different sources (network, local drive). It is independent from whether data.table is loaded as well or not. It is furthermore independent from whether the script is called through RStudio or through the command line using Rscript.exe. There is sufficient memory available (more than 100 GB RAM). Other fst files can be read successfully.

metadata_fst() works well on this file (see output below).

Is there any method to retrieve the contents of this file?

Thank you in advance for your help. Gabriel

> metadata_fst(path_fstfile)
<fst file>
120534568 rows, 43 columns (demandsimulationResult.fst)

* 'tripId'                   : integer
* 'legId'                    : integer
* 'personnumber'             : integer
* 'householdOid'             : integer
* 'personOid'                : integer
* ....

Note: other columns are of type character, double and logical.

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fst_0.9.4

loaded via a namespace (and not attached):
[1] compiler_4.1.0 parallel_4.1.0 tools_4.1.0    Rcpp_1.0.7 

Note: On another computer with R version 4.1.2 the error occurs as well.

fox34 commented 2 years ago

Have you tried incrementally reading parts of the file? E.g.

read_fst(path_fstfile, from=1, to=100)
read_fst(path_fstfile, from=100, to=1000)
read_fst(path_fstfile, from=120534468)
MarcusKlik commented 1 year ago

Hi @gabowi, did you check your memory consumption while the fst file is loading from disk? This sounds like your system doesn't have enough memory to read this file but that shouldn't crash R. Were the partial reads suggested by @fox34 successful?