JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

added file checking and overwrite functionality to write #94

Closed ExpandingMan closed 6 years ago

ExpandingMan commented 6 years ago

This should fix #93 rather elegantly (thanks to @nalimilan for the suggestion).

Interesting bit of trivia: most of my code which uses Feather was actually already doing this outside of Feather. So, it was rather silly (or cruel) of me to use this as a matter of course but not include this in Feather.jl itself.

Note that this makes it so that by default, feather will not allow you to write over existing files, but you can pass overwrite=true to delete the file before writing.

nalimilan commented 6 years ago

Cool. I think one of the strategies often used by apps which save new versions of files (e.g. text processing apps) is to rename the original file to something else, write the new one, and only then remove the original one. That's the only way to ensure that if something goes wrong during the operation (crash, power issue...), you'll end up either with the old or the new file, and not with a single half-corrupt file. Maybe that' strategy could make sense here? I guess renaming an mmapped file and removing it afterwards is OK?

ExpandingMan commented 6 years ago

I think it's just a question of how much of this functionality we want to build into Feather.jl itself. Perhaps we could allow users to pass a filename to Feather.write that would be the name to move the old file to, or allow the user to provide a file extension (like .backup) which Feather would append to any files it would otherwise try to overwrite, but I sort of feel at that point it's probably just best for the user to do that outside of Feather.

I think ideally stuff like that would be handled by FileIO (perhaps with us accommodating and interface), though as far as I know it doesn't provide anything like that right now.

ExpandingMan commented 6 years ago

@nalimilan , if you have a firm idea of how you'd like to see this implemented, perhaps raise an issue on FileIO? That would potentially provide the functionality for both Feather.jl and CSV.jl all in one go. Though as I recall FileIO wasn't good at supporting all the additional options that are available for CSV.jl.

quinnj commented 6 years ago

The problem here is I don't think windows allows you to rm(file) when file is mmapped, or indeed this would be an easier problem.

quinnj commented 6 years ago

That's why you always see things in CSV/Feather tests like

obj = nothing; GC.gc(); GC.gc()
rm(filename)

For windows, you have to make sure that original mmap-ing gets finalized/GC-ed before rm.

ExpandingMan commented 6 years ago

It's taking great self-restraint for me not to share my opinion on windows at this moment.

I don't know, how about having this as a linux only option? On windows presumably you can't get the error that instigated this PR since it wouldn't let you write to the file in the first place.

ExpandingMan commented 6 years ago

Ok, how about this then? I think this should give pretty reasonable behavior in all 4 cases (whether file exists, OS).

ExpandingMan commented 6 years ago

By the way, can we drop 0.6 now? If so, say the word, and I'll take 0.6 stuff out of this PR.

quinnj commented 6 years ago

I say we drop 0.6.

codecov-io commented 6 years ago

Codecov Report

Merging #94 into master will decrease coverage by 1.85%. The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
- Coverage   97.22%   95.37%   -1.86%     
==========================================
  Files           4        4              
  Lines         108      108              
==========================================
- Hits          105      103       -2     
- Misses          3        5       +2
Impacted Files Coverage Δ
src/sink.jl 94.28% <33.33%> (-5.72%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 00099a1...48267f7. Read the comment docs.

ExpandingMan commented 6 years ago

I removed a bunch of 0.6 stuff. So, if we merge this, we can tag a 1.0-only 0.5 at some point.

ExpandingMan commented 6 years ago

The error on appveyor looks like the same 32-bit alignment error we'd been seeing.

@quinnj : it'll be up to your vote if we merge this. Yes, the windows behavior isn't great, but it seems to me that this will still be an overall improvement.