animetosho / ParPar

High performance PAR2 create client for NodeJS
194 stars 21 forks source link

Too many input slices #8

Closed ghost closed 6 years ago

ghost commented 6 years ago

Is there a limit? I'm getting this error a lot on large rarsets.

animetosho commented 6 years ago

Yes, you cannot have more than 32768 input slices in a PAR2 set. I've changed the error message to make this more clear. Thanks for raising the issue!

ghost commented 6 years ago

So what's a slice? I tried parring a 50GB bluray which existed of 280 RAR files. Should be doable right?

animetosho commented 6 years ago

What command did you use?

"Slice", also known as a "block" (slice is the official term used in the PAR2 specifications), is a unit that all data in a PAR2 set is broken up into.

Number of input slices = input data (in your case, 50GB) divided by the slice size (with some overhead for each file; splitting files into parts does reduce the efficiency of PAR2). So if your slice size is set to 1MB, 50GB/1MB = 50,000 which exceeds the maximum of 32,768. To resolve this, you'll either have to increase the slice size or perform PAR2 over a smaller set of files.

boranblok commented 6 years ago

Just for reference, what is the exact error message that is being given now?

the block size in my application is set based upon the yEnc message size (so par2 files are aligned to message posts)

I build this up in a retry loop increasing block size when I get a block size too small exception.

Right now that logic is this:

protected override void Process_ErrorDataReceived(object sender, String outputLine)
{
    base.Process_ErrorDataReceived(sender, outputLine);
    if (outputLine != null && outputLine.Contains("Block size is too small."))
        blockSizeTooSmall = true;
}

(from ParWrapper.cs)

this works for par2 and par2 multicore, but the message is probably different enough in nodejs parpar that this logic is not triggered.

On the other hand if 32768 is a limit of the par2 format itself I could precalculate this and forego the trial and error loop.

In any case, the quickest fix on my end is if I handle your error message in this case the same as par2 and par2 multicore. A better solution will have to wait till I have more time for development.

ghost commented 6 years ago

ParPar exists with the error "Too many input slices"

boranblok commented 6 years ago

Well, that change has been committed: https://github.com/boranblok/nntpPoster/commit/167e220a173459efa0233dd2333fa1945ca5d9c1

I just have to find time to get it in a release version.

animetosho commented 6 years ago

I don't consider the error message as something I aim to keep consistent - it's intended for end users, not applications.

On the other hand if 32768 is a limit of the par2 format itself I could precalculate this and forego the trial and error loop.

Yes, the limit is from the PAR2 format itself, so my recommendation is to just calculate it: round each file up to the nearest multiple of the block size, total this across all files, then divide by the block size, for the total number of blocks.

Also take note that the speed is linearly related to the number of recovery blocks used, so having more blocks will be slower. You may wish to consider limiting the number of total blocks because of this.

boranblok commented 6 years ago

Hmmh, what would be pros/cons of using more or less recovery blocks? I always thought more recovery blocks gave more chance of recovery if the par2 messages themselves got corrupted to a certain percentage.

Is there a "sweet spot" one should aim for? I could parametrize it as well, but end users would have the same question I have. I always thought it was best to get as many recovery blocks as possible.

animetosho commented 6 years ago

You are correct in your understanding - more blocks do improve efficiency in recovery. The tradeoff is the increase in computational complexity, which is proportional to size of input multiplied by number of recovery blocks.

PAR2, using an erasure code, aims to repair blocks that are considered damaged. So, for example, if your block size is twice the article size, a single missing article means that the entire block is bad (which is 2 articles), and that is the amount that needs to be recovered. Hence larger blocks make recovery a little less flexible, particularly in the case of completely random errors.
If 100 recovery blocks are generated, you can always recovery from 100 missing articles. You may be able to recovery from more, if it so happens that multiple missing articles occur within the same PAR2 block. Larger block sizes does slightly increase this likelihood, but greatly offset by the fact that you can recover from fewer random errors.

I guess the general hope is that most articles are in tact. A badly damaged collection isn't going to saved by PAR2 anyway. I also find that many errors aren't completely random, rather, they come in streams (but I don't really download from Usenet much, so you may have different experiences).

There isn't really an ideal figure, which is why it's a common tunable. I wouldn't push it too high, because the speed decreases linearly, and error recovery improves sub-linearly, but you obviously need enough to deal with random errors.
My recommendation: for smaller posts, use more recovery, as even a small number of errors could damage a fair amount of the post, but for larger posts, use larger blocks, as you'd generally expect fewer errors, in terms of ratio to the original file. This also kinda works nicely with the 32768 input block limit, although you'll probably hit a point where that imposes a ceiling.

ghost commented 6 years ago

Hi @animetosho,

I'm now getting this error (last lines are the ParPar errors:

2017-10-30 13:23:20,090 INFO 6 nntpPoster.UsenetPoster - Rarred 23613.32 MB of file(s) with a speed of 117.04 MB/sec
2017-10-30 13:23:20,118 DEBUG 6 ExternalProcessWrappers.ExternalProcessWrapperBase - Executing process: [nodejs /opt/ParPar/bin/parpar.js -n -s 768000 -r2% -d pow2  -o "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.par2" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part01.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part02.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part03.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part04.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part05.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part06.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part07.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part08.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part09.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part10.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part11.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part12.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part13.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part14.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part15.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part16.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part17.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part18.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part19.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part20.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part21.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part22.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part23.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part24.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part25.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part26.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part27.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part28.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part29.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part30.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part31.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part32.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part33.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part34.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part35.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part36.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part37.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part38.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part39.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part40.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part41.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part42.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part43.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part44.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part45.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part46.rar" "/opt/nntpposter/working/54636d520afb4068954498f97421a690_readyToPost/54636d520afb4068954498f97421a690.part47.rar"]
2017-10-30 13:23:20,914 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase - Method used: Shuffle (128 bit), 8 threads

/opt/ParPar/lib/par2gen.js:470hreadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase - Calculating:   0.06%
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -             async.forEachOf(rf.packets, function(pkt, i, cb) {
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -                   ^
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase - TypeError: async.forEachOf is not a function
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at Object.PAR2Gen.writeFile (/opt/ParPar/lib/par2gen.js:470:9)
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at iterate (/usr/lib/nodejs/async.js:149:13)
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at Object.async.eachSeries (/usr/lib/nodejs/async.js:165:9)
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at Object.PAR2Gen.writeFiles (/opt/ParPar/lib/par2gen.js:515:9)
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at /usr/lib/nodejs/async.js:610:21
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at /usr/lib/nodejs/async.js:249:17
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at iterate (/usr/lib/nodejs/async.js:149:13)
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at /usr/lib/nodejs/async.js:160:25
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at /usr/lib/nodejs/async.js:251:21
2017-10-30 13:31:11,178 WARN Threadpool worker ExternalProcessWrappers.ExternalProcessWrapperBase -     at /usr/lib/nodejs/async.js:615:34
animetosho commented 6 years ago

Thanks for the report. How did you install dependencies?
It appears that you may have a too new version of async - ParPar requires version 1.*, I suspect you may have version 2.

ghost commented 6 years ago

Ehh I just did:

apt-get install nodejs node-gyp node-async
git clone ParPar
node-gyp rebuild

Also:

# npm -v async
3.5.2
animetosho commented 6 years ago

What distro/repositories are you using? Can you check the version via apt-cache showpkg node-async?
Edit: appears that the latest distros use a too new version :/ Are you able to try an older version?

ghost commented 6 years ago

Oh wait that was the npm version :P async version is 2.5.0

Package: node-async
Versions:
0.8.0-1 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_universe_binary-amd64_Packages) (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_universe_binary-i386_Packages) (/var/lib/dpkg/status)
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_universe_binary-amd64_Packages
                  MD5: 967c8012cc60c8d3e91c513bbc2e3181
 Description Language: en
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_universe_i18n_Translation-en
                  MD5: 967c8012cc60c8d3e91c513bbc2e3181

Reverse Depends:
  node-uglify,node-async 0.2.6
  node-log4js,node-async 0.1.15
  node-form-data,node-async
Dependencies:
0.8.0-1 - nodejs (0 (null))
Provides:
0.8.0-1 -
Reverse Provides:
animetosho commented 6 years ago

Do you have NPM? Maybe try npm install async@1.5.2 in the same folder as ParPar.

I'll need to look at making it more compatible with a wider range of versions of async...

ghost commented 6 years ago

Looks like it's working now. I haven't done really large files but 10GB file just worked fine. I will report back.

ghost commented 6 years ago

Looks like it's fixed! :D