Open drnickisaac opened 3 years ago
This is not a single line of code job! The implications are very far reaching and should be mapped out first!
@AugustT I have started looking into this. The fst
format is certainly fast for data frames, but is it really optimal for the list structure that comes out of sparta? Also, this blog post implies other potential drawbacks. I don't fully understand the details:
http://svmiller.com/blog/2020/02/comparing-qs-fst-rds-for-bigger-datasets/
I'm also slightly concerned about this shift. For me I don't see an issue with the current read/write speed, and it seems like a lot of work!
I've always preffered .rds as you can formally assign objects to an object name when you read them in, but you can get round this with a small function. And .rdata potentially seem better for lists.
I would be in favour of a switch to rds format (and in fact all my current occ mod functions work with rds files). If the implications of a change to a slightly faster format are potentially far reaching for minor benefit, I would be in favour of not a change. Other users of sparta would also have to install a new package as well to get this to work, if this became the default output.
I'd vote for a change to .rds files instead of .rdata files, we could add an argument filetype
where users could specify .rds (the default) or .rdata (the previous method)
@AugustT has suggested we switch from saving sparta outputs from .rData to .fst: this will save time and disk space. It should be easily implemented with a single line of code.