BiologicalRecordsCentre / sparta

Species Presence/Absence R Trends Analyses
http://biologicalrecordscentre.github.io/sparta/index.html
MIT License
21 stars 24 forks source link

switch from .rData to .fst #202

Open drnickisaac opened 3 years ago

drnickisaac commented 3 years ago

@AugustT has suggested we switch from saving sparta outputs from .rData to .fst: this will save time and disk space. It should be easily implemented with a single line of code.

AugustT commented 3 years ago

This is not a single line of code job! The implications are very far reaching and should be mapped out first!

drnickisaac commented 3 years ago

@AugustT I have started looking into this. The fst format is certainly fast for data frames, but is it really optimal for the list structure that comes out of sparta? Also, this blog post implies other potential drawbacks. I don't fully understand the details: http://svmiller.com/blog/2020/02/comparing-qs-fst-rds-for-bigger-datasets/

03rcooke commented 3 years ago

I'm also slightly concerned about this shift. For me I don't see an issue with the current read/write speed, and it seems like a lot of work!

I've always preffered .rds as you can formally assign objects to an object name when you read them in, but you can get round this with a small function. And .rdata potentially seem better for lists.

mlogie commented 3 years ago

I would be in favour of a switch to rds format (and in fact all my current occ mod functions work with rds files). If the implications of a change to a slightly faster format are potentially far reaching for minor benefit, I would be in favour of not a change. Other users of sparta would also have to install a new package as well to get this to work, if this became the default output.

03rcooke commented 1 year ago

I'd vote for a change to .rds files instead of .rdata files, we could add an argument filetype where users could specify .rds (the default) or .rdata (the previous method)