Open apalazzi opened 3 years ago
Weird.
The only suggestion I can say is try giving it just 1 recovery block, that is, just file Full-0005.vol000+001.par2. If it is a problem with the Reed-Solomon Matrix inversion, that will make it very simple. If that doesn't work, try one of the other files.
If those do not work, there's not much to go on. You could try downloading a different PAR recovery program. They actually use different code to do the repair, so they may work if you're hitting a bug in this version.
Hi,
Still no luck:
andrea@atlante:~/par2-verify$ ls Full-0005 Full-0005.par2 Full-0005.vol000+001.par2 tmp andrea@atlante:~/par2-verify$ par2repair Full-0005 Loading "Full-0005.par2". Loaded 4 new packets Loading "Full-0005.vol000+001.par2". Loaded 1 new packets including 1 recovery blocks Loading "Full-0005.par2". No new packets found There are 1 recoverable files and 0 other files. The block size used was 9663672 bytes. There are a total of 2000 data blocks. The total size of the data files is 19327343417 bytes. Verifying source files: Opening: "Full-0005" Target: "Full-0005" - damaged. Found 1999 of 2000 data blocks. Scanning extra files: Repair is required. 1 file(s) exist but are damaged. You have 1999 out of 2000 data blocks available. You have 1 recovery blocks available. Repair is possible. 1 recovery blocks will be used to repair. Computing Reed Solomon matrix. Constructing: done. Solving: done. Wrote 19327343417 bytes to disk Verifying repaired files: Opening: "Full-0005" Target: "Full-0005" - damaged. Found 1999 of 2000 data blocks. Repair Failed.
I'll try with another program. In the meantime, if you can give me some hint I could try to run the program through the debugger and see if I can catch a bug.
So far I've tried with QuickPar, MultiPar and phpar, however none of them succeeded. I'm also under the impression that all those programs are in a way or another just a fork of par2cmdline, so if this is a bug in the core functions it's present in all of them.
Do you know of a par recover program that has for sure a different core code?
I believe par2cmdline was originally written by the same author as QuickPar, but he made big improvements to his program. I'm positive MultiPar is different. I don't know about phpar.
I'm the designer of the math for Par2. I don't know much about the code for par2cmdline. I don't know what to say. As a random thought --- a very random thought --- is it possible the file is set read-only or you don't have permissions to write the file?
Beyond that, I'm afraid that I am not much help. You're welcome to download and compile the code and add your own debugging info.
I'm positive MultiPar is different. I don't know about phpar.
phpar2 is forked from par2cmdline. Multipar is different, though the author claims it's originally a C port of par2cmdline, so may be inspired by the code base to some degree.
The only other completely different implementation I know of is gopar, but it seems to be more of a proof-of-concept rather than an "real world application". You could give it a spin, but I wouldn't expect it to work miracles.
I confirm that with multipar it still doesn't work; I've also tried with gopar but I have an error (see here ).
Could it be that the big size of the archive is the source of the issue? The main file "Full-0005" is 15G and the block size is >9 M.
The limits in the spec are for 64-bit file lengths. Some clients may not use 64-bit values to store them ... but that would violate the spec. I don't suppose you're working with an older filesystem that limits file lengths to 2 or 4GB?
Part of the spec is that every file contains a packet that says which client created the PAR2 file. On an error, a client is supposed to print out the contents of that packet, so that we can track down a client that makes a bad file. I find it strange that par2cmdline isn't printing it --- we should fix that. For the moment, you could try running "strings" or a hex-editor on the smallest PAR2 file and see if it contains the name of the client. It should be right after the text "PAR 2.0\0Creator\0".
If you can find out the program that created the file, you could try using that to repair.
The recovery data was created with par2cmdline v0.7.4, here is the output from gopar:
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes) Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0 Loaded creator packet with client ID "Created by par2cmdline version 0.7.4." Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd) [1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses) Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
I'll be trying with v0.7.4 and see if that makes a difference.
BTW the repair also fails with gopar...
To add some more info, with gopar the repair fails with the following message:
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes) Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0 Loaded creator packet with client ID "Created by par2cmdline version 0.7.4." Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd) [1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses) Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687 Loaded recovery packet: exponent=3, byte count=9663672 Loaded recovery packet: exponent=4, byte count=9663672 Loaded recovery packet: exponent=5, byte count=9663672 Loaded recovery packet: exponent=6, byte count=9663672 [1] Loaded volume file "Full-0005.vol003+004.par2" Repair error: hash mismatch in reconstructed data
@mdnahas just for info, are you also following the bug report I submitted to gopar? The author is very reactive and willing to dig into this issue, and I think the data we're getting can be really useful to understand what's going on here.
I've encountered probably the same issue while using par2cmdline v. 0.8.1 on Debian 11.4 (bullseye) with ZFS for DCPs which include files with size over 20 GB. par2cmdline seems to behave erratically. In all cases there are plenty repair blocks available and only few are needed.
Shall I post further details here or open a new issue? par2_problem.txt
Hi @Zrin , in my case I strongly suspect that the cuplrit was a faulty memory module and the redundancy data was incorrect right from the start, thus making impossible the recovery. I recommend that you run a memory testing program and see if your ram is good or faulty, especially if your're not using ECC ram.
Hi @apalazzi, I think faulty RAM is very unlikely the cause because it seems that I can reproduce the issue, and even if the error correction blocks would be faulty, the tool should not attempt to repair wrong file(s).
Nevertheless, I'll see what memtester
will report.
I'll also run more tests on different machines.
It seems so far that there are issues with the SATA controller on the system where I've experienced the problems. Nevertheless, par2cmdline could be more resilient in such situation. I'll run more tests to confirm.
Happened to look at the inversion code. par2cmdline has an assert if a failure occurs, so will actually crash if the known PAR2 math defect is encountered.
On the other hand, MultiPar seems to ignore a recovery block and retry, when such issue occurs.
Since par2cmdline didn't crash in these cases, I suspect the most likely cause to be related to bad memory during PAR2 create, as suggested by apalazzi. MultiPar and ParPar have memory built-in memory checksumming to try to detect these sorts of issues, though there's only so much software can do against a hardware fault.
Faulty disks (or related, such as bad I/O controllers) can be a mixed bag. If it happens during create, the bad data should be caught by the checksum when verifying/repairing. If the I/O fault occurs during repair instead, you could get odd behaviour - in such a case, I don't think software can do much about it other than report that something isn't right.
All in all, it's best to work on reliable hardware. Unfortunately, you generally need it to be reliable during create, the result of which often goes untested until you need to repair.
MultiPar/ParPar can provide a little extra margin of safety with PAR2 creation, if this is a concern.
Faulty disks (or related, such as bad I/O controllers) can be a mixed bag.
The issue I've encountered was that the controller (on the mainboard) delivered corrupted data under certain circumstances. Reading from the same (huge) file multiple times gave different data. Replacing the mainboard solved the issue.
To detect that, one can checksum the file(s) multiple times and compare. A very "careful" tool might do that when a problem is detected or on user's request.
Thank you all for responding!
To detect that, one can checksum the file(s) multiple times and compare.
That might work for your specific case, but not guaranteed for others. Repeatedly reading the file may not even do much, e.g. if it's cached by the OS or by some RAID controller or the like.
And that ignores the fact that it'd greatly reduce performance, and hence generally be undesirable.
I think trying to find the cause for fault is out of scope for a PAR2 tool. There's all sorts of things that could go wrong (e.g. bug in par2cmdline, bad PAR2 file, OS bugs, hardware faults etc) and it'd be extremely difficult/impossible to try to check everything.
To me, it makes the most sense for the tool to detect a fault, report it and leave it up to the user to troubleshoot.
it'd be extremely difficult/impossible to try to check everything. To me, it makes the most sense for the tool to detect a fault, report it and leave it up to the user to troubleshoot.
Exactly. It should be sufficient to check the file(s) checksum(s) before reporting successful repair and alert the user that there is something unexpected going on.
par2cmdline already does that.
Of course, with unreliable hardware, there's no guarantee.
If the built-in post-repair checksum check doesn't feel good enough to you, you're free to run a subsequent verification pass. Of course, still no guarantee if we're talking unreliable hardware.
Hi,
I'm trying to restore a corrupted file, however par2restore is unable to repair the file:
Running with -vvvv does not add any meaningful info, let me know if you want me to do some specific test.
Andrea