Open HenrikBengtsson opened 8 years ago
On 2016-04-01, @gaborcsardi proposed that one could maybe design a specific connection type that does write-to-temporary-file-name-and-rename-when-done for us, e.g.
saveRDS(x, file=atomic("foo.rds"))
write.csv(data, file=atomic("data.csv"))
(Adopted from Wiki entry)
Background
When writing to file, there is always the risk that the process is interrupted which may result in an incomplete file. Depending on file format, it can be extremely hard, or even impossible, to detect that the file is incomplete. For instance, if writing a data frame with 100,000 rows to a comma-delimited file using
write.csv()
, if we're unlucky, the writing may be interrupted at the end of a row, e.g. when 98,953 complete rows have been written. If so,data <- read.csv()
will happily read the 98,953 rows and there is no way for us to know that the file is incomplete. Even if it is possible to detect incomplete and/or corrupt files, it can be extremely tedious to identify them.This is a real problem when generating a large number of files, especially large files for which the risk of being exposed to an interrupt increases.
Suggestion / Wish
If the file are written atomically, that is, either all of the file is there at the end or not at all, then the problem of knowing whether the file is complete or not would not exist. One approach for writing files atomically is to write using a temporary file name and then rename on completion.
Prototype / example
Assume we save the file using
saveRDS(x, file="foo.rds", atomic=TRUE)
. This could in principle be done as:saveRDS(x, file="foo.rds.tmp")
file.rename("foo.rds.tmp", "foo.rds")
If there is an interrupt, there will be a left-over
*.rds.tmp
file, but not the final*.rds
file. There could be options for automatically cleaning up incomplete files, or renaming the temporary file to, say,*.rds.error
if an error was thrown while writing the file.