animetosho / ParPar

High performance PAR2 create client for NodeJS
190 stars 19 forks source link

Enhancing Efficiency with File Ordering and Slice Alignment #57

Closed meemu77 closed 3 months ago

meemu77 commented 3 months ago

First off, thanks for the fantastic work on ParPar—it's truly a great implementation! I've been exploring how to optimize the recovery process, especially in scenarios involving files of varying sizes.

I have a question regarding the possibility of leveraging strict file ordering to align files more effectively within slices/blocks. Often, we deal with multiple files of similar size, but when smaller files are also part of the equation, they can end up distributed across multiple slices. This distribution potentially makes the recovery less cost-effective.

Would it be feasible with ParPar to arrange the input so that all smaller files (and actually any files not matching the common size) are grouped towards the last slice? The idea is that, by doing so, we might minimize the number of slices affected by small file corruption, hence reducing the amount of recovery data needed when such files are damaged or lost.

Any insights or suggestions on whether this approach could be implemented, or if it might indeed result in more efficient recovery processes, would be greatly appreciated!

Thanks for your time and looking forward to your thoughts!

animetosho commented 3 months ago

Appreciate the praise!

I suspect you might be misunderstanding how files are placed in PAR2. Files are always aligned to a block boundary - they cannot start from the middle of a block.
As such, ordering won't have any effect on the recoverability of completely missing/broken files. The PAR2 spec also strictly defines the ordering of input files, so clients don't even have the ability to reorder files/blocks either.

If you're dealing with a lot of mismatched file sizes, it might be beneficial to put the smaller files into an archive before applying PAR2, as there'll be some efficiency gain from the data being concatenated.

meemu77 commented 3 months ago

Thank you for clarifying. I certainly seem to have misunderstood how the files are split into blocks. I thought all files are kind of concatenated and then that big chunk of files sliced to defined sized blocks.