Parchive / par2cmdline

Official repo for par2cmdline and libpar2
http://parchive.sourceforge.net
GNU General Public License v2.0
714 stars 73 forks source link

Cannot recover one-bit flip #190

Open Vladimir-Kondratiev opened 11 months ago

Vladimir-Kondratiev commented 11 months ago

I changed in file one bit from 1 to 0. par2cmdline cannot recover file after it (even with -r 100). Reproducible code is attached. code.txt

animetosho commented 11 months ago

If it's of any help, MultiPar seems to be able to recover from the generated PAR2 fine.

My baseless guess is that par2cmdline gets confused over all the duplicate blocks; in theory, this case should be recoverable even at -r0.

gbletr42 commented 6 months ago

Looking at the output of the command, it reports 2000 available source blocks, 600 excess repair blocks, and that none of the repair blocks shall be used, despite needing at least one and printing the file was damaged, before heading into processing. animetosho's par2cmdline-turbo handles this case just fine though.

I don't know much about par2cmdline's codebase, but I'd imagine the bug lies somewhere in the counting of missing blocks. Strangely, it counts it right the second verification, reporting only 1999 blocks available.

EDIT: The reason why it succeeds the second time is that it overwrote another output file block (not the one where the bit flip happened) with zeroes. This zeroed out block is deterministically at offset 899800000 to 900000000.

Appears this happens for other bunches of duplicate data, not just the number picked here, and for any change of a single byte, not just a bit flip. For example, piping in input from yes unchanged and then doing the same corruption leads to the same result.

Par2 seems to be unable to recover any burst corruption, at any offset, only corrupting more data. It does pull in recovery blocks at larger bursts, but introduces corruption of equivalent size (number of found blocks second verification is always 1 less than what was initially found). One interesting event that happens though is that at larger burst sizes, it does fix the corruption at the original offset, only to zero out blocks at another offset.

Also when testing the maximum corruption with 270MB of zeroes inserted, the command fails with the message 'Could not read 2608 bytes from $PWD/900MBones_crc_ok.1 at offset 900197392: No such file or directory', indicating that par2 seems to be trying to seek to an illegal position 197392 bytes ahead of the size of the backup file it created. This is likely a separate bug.

With further testing using binary searching, this illegal offset happens (with different number of bytes illegally requested) when 211013825 or more bytes are overwritten with zeroes, offset 200MB, for a file generated with 'dd if=/dev/zero bs=1MB count=900 conv=fdatasync | tr '\000' '\377' > ./900MBones_crc_ok' (I didn't bother with the replacement of one byte).

All of this done with v0.8.1, on Debian Bookworm. Neither bug seems to happen on par2cmdline-turbo by animetosho. Hopefully this is helpful to somebody ^_^

Eh, here's a link to the files that cause par2 to read at a bad offset. It occurred to me that the offsets it zeroes out are the last in the file, so this may be some kind of buffer underrun or something in the RS processor (which would explain why par2cmdline-turbo is unaffected, since doesn't it use ParPar's RS code instead?)

https://pixeldrain.com/u/25GjaYLb