PSeitz / lz4_flex

Fastest pure Rust implementation of LZ4 compression/decompression.
MIT License
441 stars 28 forks source link

Handling missing lz4 terminator on graceless shutdown #110

Open l4l opened 1 year ago

l4l commented 1 year ago

I have long-living process that writes to a file with compression. It may happen that the process is been killed leading to a file not being flushed properly even with auto_finish (#95). Sometimes it may happen that the data has been written completely but only a lz4 ending (EndMark + Checksum) is missing. What's the preferable way of handling this case? Currently reading such block leading to error like "failed to fill whole buffer" which happens because of Read::read_exact.

PSeitz commented 1 year ago

What is the signal to kill the process, does it allow graceful shutdown like SIGTERM or is it SIGKILL?

l4l commented 1 year ago

No, for the graceful one auto_finish should work fine. I'm considering the case where it's impossible to do that SIGKILL/SIGABRT/SIGSEGV etc.

PSeitz commented 1 year ago

Oh I just saw it says that in the title already.

You probably want some atomic writes, where it writes to a temp file and then renames it to the final file after the last flush.

l4l commented 1 year ago

Not really, I already have a mechanism for dropping "corrupted" files. But still I want to try to recover a partly written file even after failure. In particularly, I'm interested in the case when all the data has been flushed but there's missing lz4 terminator. Thus ideally, I'd like to read all the data and then get a error like UnexpectedEof or similar.

PSeitz commented 1 year ago

How would you know if it's only missing an lz4 terminator and not more?

How does it behave currently before it fails with "failed to fill whole buffer", e.g. does the underlying reader in FrameDecoder get parts of the corrupted block?

l4l commented 1 year ago

How would you know if it's only missing an lz4 terminator and not more?

There's no way of determining that one? I thought length of the block is stored somewhere in the meta as well.

How does it behave currently before it fails with "failed to fill whole buffer", e.g. does the underlying reader in FrameDecoder get parts of the corrupted block?

Didn't relly check the behavior of lz4-flex. I read and decode the whole file at once. Probably it fails at the last block but not really sure.

PSeitz commented 1 year ago

There's no way of determining that one? I thought length of the block is stored somewhere in the meta as well.

Yes, each block stores it's length. The terminator marks the end of the blocks in the frame. It's a invalid format without it and failing decompression is fine as long as it doesn't panic.

PSeitz commented 1 year ago

You could write multiple frames instead of a frame with multiple blocks. Then load the complete frames and discard the last corrupted one.