animetosho / ParPar

High performance PAR2 create client for NodeJS
190 stars 19 forks source link

Cannot repair with par2 created by parpar #9

Closed Safihre closed 6 years ago

Safihre commented 6 years ago

I received this NZB from a user. It has something I have never seen before: Multipar and par2cmdline repair it, but after repair the files are still damaged. TestParPar.zip Also NZBGet says PAR Failure.

I purposefully modified this NZB to have some missing articles, because on my usenet server the download is complete and verifies correctly as being 100% complete. The user has a number of NZBs where repair files like this, all created using ParPar (could of course still be a coincidence!).

animetosho commented 6 years ago

One way to check the output of ParPar is to grab the whole file and generate PAR2 archives with ParPar, par2cmdline etc and compare the output (PAR2 packets, not the entire file). I have a script which does this somewhat (which I wouldn't be surprised if it's a bit broken).

Slightly concerned that people are actually using it in the wild though, particularly if it's the Git version (which I consider to be rather unstable at this stage). I use v0.2.2 myself, which seems to be largely fine, but I often don't test the Git code at all.
As a testament to that, I just ran some tests and see that the current stuff ParPar outputs is rather broken.

Thanks for bringing up the issue nonetheless - I'll need to see what's broken at the moment.

Edit: the fix below should fix the most egregious bug. I've still got a few failing tests, though they're mostly for cases that aren't actually likely. I haven't downloaded your file, so don't know if this is the cause of it (if there's large swaths of null bytes in the PAR2 files, it's likely).

Safihre commented 6 years ago

Thanks for the quick response :) Would be great if you could let us know if indeed this would fix the problem with my example file, then we can also announce it on GitHub/Twitter so people know why it fails and maybe content posters will update their ParPar 👍 Also on NZBGet forum: https://forum.nzbget.net/viewtopic.php?f=3&p=19802

Safihre commented 6 years ago

This is the same NZB, but before I edited to have missing articles so you can get the original data files: TestParPar-NonBroken.zip

hugbug commented 6 years ago

Thanks for quick fix!


Found interesting part in par2 spec:

If a client is unable to process a recovery set, the contents of the creator packet must be shown to the user. The goal of this is that any client incompatibilities can be found and resolved quickly.

Future versions of nzbget will print that info and error messages will look like this:

screen shot 2017-11-06 at 22 49 59

animetosho commented 6 years ago

To be honest, ParPar is still experimental and shouldn't really be relied upon at this stage. If one must use it, the tagged releases are better than Git code, as they are actually tested to some extent (though the latest tag is a bit old and doesn't support stuff like recovery %, which is quite popular).
Actually, one should probably avoid Git versions of any PAR2 client (though perhaps par2cmdline is stable enough that it's usually fine).

Though I never really expected that this would get used, particularly the Git code, so haven't really focused so much on correct functionality at this point. Perhaps I should start thinking about making this work properly...

I don't have any usenet download set up, so can't test your example, sorry. I assume the current version handles it fine though (I'm quite confident with the validity of multiplication kernels as I test them quite a lot, so the exact content of files don't really matter so much, only really the sizes, settings used etc).

Reporting the creator is a good idea. One thing to note is that PAR2 files themselves can become corrupt, so a PAR2 failure may not be a fault of the creating application (but a common theme can help point to a source).

Safihre commented 6 years ago

@animetosho I am not sure they used the git version, since it simply says ParPar v0.2.2 [https://animetosho.org/app/parpar], isn't that the stable version you mentioned?

Here are all the files, hope they give more information: https://www.dropbox.com/s/jc5wt2wqso27ffx/Store.rar?dl=1

animetosho commented 6 years ago

I am not sure they used the git version, since it simply says ParPar v0.2.2 [https://animetosho.org/app/parpar], isn't that the stable version you mentioned?

No, I don't use special version numbers in Git code, so it'll show the version of whatever the latest tag is. Maybe I should adopt some system (like even/odd versioning for release/dev versions), but, eh, :effort:

Thanks for going to the effort to upload that by the way.

boranblok commented 6 years ago

Well, I advised the users of my tool to switch over to another par2 generation utility for now. Speed is of course nice, but the output has to be reliable as well.

If you have stable versions I can recommend let me know, I will be following this thread. Because the speed gains were indeed quite great.

animetosho commented 6 years ago

Here are all the files, hope they give more information:

Okay, I got around to looking at this. The PAR2 files there contain swaths of null bytes, which is a symptom of the bug fixed above. Current Git code generates identical files compared to v0.2.2 release, and these are functionally identical to those created with par2cmdline using the same settings.

Thanks again for going to the effort of uploading the files!

boranblok commented 6 years ago

Sidenote about versioning. You could increase version as soon as you make a release. This way latest git version wont be identical to latest release version. I myself am also guilty of this tbh.

Even/odd versioning isnt a bad idea either.

GameTimeWarp commented 6 years ago

animetosho, could you add a par verify command (parpar v parsetname.par2 *) similar to that in par2tbb that would verify the parset after creation? I can only assume this would detect bad pars and posters could then code their scripts around this and create a new parset if needed. This may also help track down other bugs in the git version. Several other par programs can do this but sadly, they aren't threaded.

Safihre commented 6 years ago

This would not have solved the problem in this case. If you use that command on the dataset I linked above it will say that the par2 files are undamaged.

GameTimeWarp commented 6 years ago

Thanks Safihre, I feared a verify wouldn't detect these bad parsets in this case. Do you have any other thoughts on other possible solutions to detect this specific issue besides of course the posters manually downloading their posts and removing articles or files from the nzbs to see if the parset actually repairs the rarset and extracts?

animetosho commented 6 years ago

@boranblok Thanks for the suggestion.

other thoughts on other possible solutions to detect this specific issue

Verify doesn't recalculate the PAR2 archive, only just checks specified hashes for damage, so the only 'true' verification would be to recreate the whole archive and compare. Of course, this does require the recreating program not to contain (the same) bugs.

Alternatively, you could just simulate damage and attempt a repair. For example:

  1. create PAR2 archive
  2. copy source files to another location, excluding parts of it (e.g. head -c-1048576 src > dest to omit the last 1MB of the src file)
  3. attempt to repair this temporary copy using the PAR2 archive
  4. compare hashes of source against this copy (or rely on the repairer's output if you trust it)

I imagine this would be rather slow, so don't think people will really do it though.

boranblok commented 6 years ago

This is however how I tested the validation. Downloaded my own uploads but said to sab not to verify/repair/extract, then I threw away the max number of files I could still repair from. In those tests everything went fine.

Of course my test cases are not extremely representative, my biggest testfile was 700MB.

animetosho commented 6 years ago

Bug was introduced in 4f800ecc42883df336467bad9a9411f36328a86b. The bug pretty much destroys all PAR2 recovery packets (unless your block size was under 100KB or thereabouts) so recovery using it wouldn't have worked at all. You likely tested an earlier commit.

Fossil01 commented 6 years ago

Is there a fix already in place?

animetosho commented 6 years ago

Fix for...?
The bug causing the corruption in the first post should be fixed in current Git code. Note that Git code is still considered 'unstable'.

Fossil01 commented 6 years ago

Roger that

Safihre commented 6 years ago

Since it's fixed now, I'll close this :)