Support running more par2 instances at a time

Dri0m commented 3 years ago

Problem

There is a lot of "empty" waiting while creating/verifying small files (in my case, tens of thousands of ~4MB files) and my CPU and disk are not utilized, even when par2 is actively creating/verifying a file.

Solution

Add option to process more files at a time - not with one par2 process, but by running multiple par2 processes at the same time. This could significantly speed up your tool when working with small files.

brenthuisman commented 3 years ago

Can you show how you measured this?

Dri0m commented 3 years ago

Can you show how you measured this?

Not in an exact manner, I ran par2deep on a directory containing restic repo and I'm looking at the task manager (on Win10). HDD utilization is at 10%, reads reaching 20MB/s during verification (CPU usage during verification is very low), while I haven't seen the CPU utilization higher than ~25% (~4 threads) by par2deep during par2 creation.

I could run more experiments and gather some actual numbers for you if you want.

brenthuisman commented 3 years ago

I think you're seeing regular par2 behavior.

AFAIK, par2 runs continuously in par2deep, and it may use multiple threads (one of the reasons for having the switch where you can provide an external par2 binary, is to change such behavior depending on your par2 build). I've also noticed that despite getting priority, it never seems to peg any particular resource, but the par2 codebase is too difficult for me to read to really make sense of it. In principle, I don't have the time to really dig into par2 behavior itself, and in case you've never seen that codebase, I promise you there are no simple wins here ;)

In the interim, you can use any external par2 build (there are some focusing on multiprocessing) or if you have time, make a PR ;)

Dri0m commented 3 years ago

Well yes, it is par2 behavior. But that should not stop you from improving your tool by running more par2s at once?

So I went and did that. Threw together mujltiprocessing.Pool and subprocess.Popen and tested it. process = subprocess.Popen(["par2.exe", "c", "-q", "-u", f"-n1", f"-r10", f"-b500", file], ....)

Tested on a sample taken from a restic repo, 200 files, ~1GB. Tested on an NVMe SSD, disk I/O was not a bottleneck. CPU is 8c/16 Ryzen, 32GB RAM.

PS> Measure-Command -Expression { python par.py run --create -p 1 ..\test\ }
TotalMilliseconds : 14916,7407

PS> Measure-Command -Expression { python par.py run --create -p 2 ..\test\ }
TotalMilliseconds : 8376,359

PS> Measure-Command -Expression { python par.py run --create -p 4 ..\test\ }
TotalMilliseconds : 5072,0759

PS> Measure-Command -Expression { python par.py run --create -p 8 ..\test\ }
TotalMilliseconds : 3976,2278

^^ maxxed my CPU

PS> Measure-Command -Expression { python par.py run --create -p 16 ..\test\ }
TotalMilliseconds : 4277,3008

Also, reducing block count is very helpful for smaller files, the speedup is significant, and we probably don't need 2000 blocks for small files. Changing block count dynamically based on file size might be worth it.

I might be comparing apples and oranges here, but doing something like this inside your tool might boost the performance.

brenthuisman commented 3 years ago

So, doing threading outside of par2 introduces two problems:

Because an external par2 command can be used, doing threading in par2deep in addition to the par2 executable seems superfluous. Moreover, a number of threads working fine on a NVMe drive, might bog down usage on spinning rust/plastic (where I usually use par2deep). I guess the number of threads must be configurable. So far, I thought that people who care, would shell out to par2cmdline or somesuch. Have you tried that?
Due to the Qt GUI, I can't use Python's multiprocessing/threading, but must use Qt's. Right now I use the very simple QThread, but I think in QThreadpools I have to use QRunnables [1], which are no longer more or less the same as a Python thread by involve more ceremony and overhaul. I haven't looked into how that might work, I'm not a big Qt user. Also, the commandline part of par2deep does not thread at all right now, and I don't want to introduce a Qt dependency there. To also make that part work in parallel, I'd have to have two implementations. Not sure if I want to do that ;)

[1] https://www.learnpyqt.com/courses/concurrent-execution/multithreading-pyqt-applications-qthreadpool/

brenthuisman commented 5 months ago

I'll make a release soon-ish using https://github.com/animetosho/par2cmdline-turbo, which should render this issue closed. There is no benefit to operating on multiple files concurrently, if you can saturate the CPU, which any solid par2 implementation can do.

brenthuisman / par2deep

Support running more par2 instances at a time #15

Problem

Solution