multithreading: input file discovery / reading parallelism

ThomasWaldmann commented 6 years ago

talked with @fd0 at 34c3 about MT and he mentioned that the sweet spot for input file discovery / reading parallelism seems to be 2 (for hard disks, hdd based RAID) when doing a backup.

1 is too little to use all the available resources / capability of the hw. more than 2 is too much and overloads the hw (random HDD seeks in this case).

fschulze commented 6 years ago

For an SSD this can be quite different, it would be great if one can adjust the number of threads for that case. Currently it takes an hour to scan my whole disk with a transfer size of less the 500MB. So for now I'll use borgbackup as a less frequent addition to TimeMachine (which I don't fully trust, I've seen files missing on spot checks).

It would also be great to know what parts take time, whether the scanning, matching, crypto, transfer or whatever.

fd0 commented 6 years ago

I made several tests on very different hardware (with the help of a bunch of people), and it mostly did not help reading more than 2 files in parallel on SSD an/or RAID. But in order to do this in a more scientific way (I did not save the program and the data) I'd like to re-run these tests and graph the results.

Would you like to help with that and/or participate?

fschulze commented 6 years ago

@fd0 I can certainly provide measurements from my end if I don't have to setup too much. I'm totally fine installing a development borg version on my laptop, but would like to avoid doing anything on the storage side.

fd0 commented 6 years ago

Yeah, I plan to do that in Go (concurrency is very easy), so the test binary is just a statically linked binary that you can build locally and then copy to the test machine and run it there (even cross-compilation is very easy).

fd0 commented 6 years ago

I've build a small program we can use for measurements here: https://github.com/fd0/prb

It traverses the given directory in one thread, and reads all files in a specified number of worker threads. For example, my benchmarks for a directory on the internal SSD of my laptop gives me:

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   326863  78365   25034388499 152.982354574   163642327
2   326863  78365   25034389119 108.725610135   230252919
3   326863  78365   25034389494 93.519914623    267690465
4   326863  78365   25034389948 89.576514578    279474927
5   326863  78365   25034390236 89.055505093    281109968
6   326863  78365   25034390629 88.652750661    282387071
7   326863  78365   25034390913 88.978444428    281353434
8   326863  78365   25034391508 88.363886038    283310214
9   326863  78365   25034396005 89.240226907    280528152
10  326863  78365   25034396341 88.483356924    282927741

(Still running for the internal hard drive...)

jkahrs commented 6 years ago

On the internal NVMe: (MacOS 10.12)

Helper script:

TARGET=$HOME
for i in 1 2 3 4 5 6 7 8 9 10; do  
    sync && sudo purge
    bin/prb --workers $i --output /tmp/benchmarks.csv "$TARGET"
done

Results:

workers files   dirs    bytes   time (seconds)  bandwidth (per second). 
1   134597  22829   33534469011 43.31894966 774129319. 
2   134597  22829   33534390461 26.958458113    1243928355   
3   134597  22829   33534441292 22.807006458    1470356986  
4   134597  22829   33534441365 20.826685577    1610166977  
5   134597  22829   33534295113 20.906010019    1604050465  
6   134597  22829   33534344292 21.007518011    1596302060  
7   134597  22829   33534391668 20.661027471    1623074734  
8   134597  22829   33534399908 20.66181482 1623013283  
9   134597  22829   33534453044 21.010812681    1596056923  
10  134597  22829   33534521348 20.446736352    1640091639

fd0 commented 6 years ago

Next data point: My internal hard disc:

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   11559   268 55096088012 852.778747346   64607717
2   11559   268 55096088012 974.067868801   56562884
3   11559   268 55096088012 1010.936685754  54500038
4   11559   268 55096088012 1057.294799461  52110431
5   11559   268 55096088012 1075.07961856   51248379
6   11559   268 55096088012 1110.684131519  49605541
7   11559   268 55096088012 1159.329260498  47524107
8   11559   268 55096088012 1180.693423095  46664177
9   11559   268 55096088012 1215.789427597  45317130
10  11559   268 55096088012 1252.950178241  43973087

fd0 commented 6 years ago

Another machine, reading data from an SSD via SATA:

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   88389   25083   14306522715 52.196173041    274091410
2   88389   25083   14306522715 35.386510019    404293124
3   88389   25083   14306522715 31.67159325 451714651
4   88389   25083   14306522715 31.08763338 460199801
5   88389   25083   14306522715 31.059477186    460616984
6   88389   25083   14306522715 31.167345965    459022809
7   88389   25083   14306522715 31.012411926    461316028
8   88389   25083   14306522715 30.927243427    462586416
9   88389   25083   14306522715 30.926412386    462598847
10  88389   25083   14306522715 31.153801543    459222374

jkahrs commented 6 years ago

Same system, connected via USB3:

5400 rpm HDD

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   44370   12234   2557745423  64.929517906    39392644
2   44370   12234   2557745423  45.042011148    56785773
3   44370   12234   2557745423  47.941166942    53351755
4   44370   12234   2557745423  51.798936151    49378338
5   44370   12234   2557745423  54.807403461    46667881
6   44370   12234   2557745423  57.106539   44789011
7   44370   12234   2557745423  58.573511115    43667271
8   44370   12234   2557745423  60.245337734    42455491
9   44370   12234   2557745423  62.257068435    41083614
10  44370   12234   2557745423  64.328836775    39760479

SSD

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   12448   7383    44389369458 111.594261105   397774661
2   12448   7383    44389369458 104.685872615   424024449
3   12448   7383    44389369458 104.894013065   423183060
4   12448   7383    44389369458 104.908220452   423125749
5   12448   7383    44389369458 104.540388212   424614545
6   12448   7383    44389369458 104.80080464    423559433
7   12448   7383    44389369458 105.194333052   421974912
8   12448   7383    44389369458 104.782470494   423633545
9   12448   7383    44389369458 105.063509001   422500351
10  12448   7383    44389369458 105.366123482   421286918

ThomasWaldmann commented 6 years ago

Guess the only things missing still are some RAID systems with many HDDs or SSDs.

fschulze commented 6 years ago

NVMe in a late 2016 15" MacBook Pro (4 cores with Hyperthreading):

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   4243314 756425  301394839765    1018.361214516  295960642
2   4243380 756425  301396556781    684.329069022   440426353
3   4243440 756426  301397998843    558.604872262   539554905
4   4243475 756426  301399037339    523.684396133   575535646
5   4243519 756426  301400593097    509.297344263   591796907
6   4243566 756427  301401846249    522.760056444   576558676
7   4244200 756429  301429018785    530.952168634   567714074
8   4244429 756429  301437311356    529.609069741   569169465
9   4244469 756429  301438821859    509.666049886   591443793
10  4244508 756429  301440390469    510.457233737   590530157

@jkahrs what machine do you have? Your NVMe speed is impressive

Now this all got me thinking. Is borg actually reading all the files for every backup? I thought it's more like rsync which only reads the files if the stats changed.

If it actually is only reading files when the stats change, then the directory traversal is the bottleneck. As you can see just my home directory has more than 4 million files.

If traversal is the bottleneck, does borg already use https://pypi.python.org/pypi/scandir? Making traversal multithreaded is most likely harder, but could speed things up a lot. What about xattr and resource forks, I guess because scandir doesn't include them, that could be multithreaded more easily.

If borg actually reads files for all backups, then an option to work like rsync would be very useful for me. I have backups for servers where I know that files don't change without stat changes and they have hdds where not reading stuff would help a lot.

ThomasWaldmann commented 6 years ago

NVME SSD in my workstation (with few big VM files)

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       29      14      60873793654     22.810396927    2668686294    # 2.67GB/s
2       29      14      60873793654     28.385084934    2144569720
3       29      14      60873793654     34.139300982    1783100177
4       29      14      60873793654     33.857192252    1797957527
5       29      14      60873793654     33.319131428    1826992212
6       29      14      60873793654     33.566628508    1813521237
7       29      14      60873793654     33.624335908    1810408800
8       29      14      60873793654     33.641778505    1809470139
9       29      14      60873793654     33.238669888    1831414850
10      29      14      60873793654     33.186091691    1834316442

Looks like reading big files gets worse with more workers.

ThomasWaldmann commented 6 years ago

Same, but with more and smaller/medium files.

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       268881  35541   59742969794     40.711983296    1467454173  # 1.47 GB/s
2       268881  35541   59742969794     40.119823345    1489113480
3       268881  35541   59742969794     41.636482683    1434870717
4       268881  35541   59742969794     40.749647873    1466097817
5       268881  35541   59742969794     39.5652761      1509984908
6       268881  35541   59742969794     39.127989676    1526860191
7       268881  35541   59742969794     38.880705822    1536571122
8       268881  35541   59742969794     38.749396771    1541778060
9       268881  35541   59742969794     38.660238397    1545333714
10      268881  35541   59742969794     38.665501079    1545123382

ThomasWaldmann commented 6 years ago

@fschulze borg does not open unchanged files. but it fetches stats, xattrs, acls, bsdflags.

jkahrs commented 6 years ago

@fschulze this is the late 2016 13" model. I had the feeling that after downgrading back to Sierra with encrypted HFS+ the I/O went way up.

Software RAID5 HDD (8 Disks) (ext4):

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   225393  44283   16670948275 207.924925535   80177728
2   225393  44283   16670948275 152.340443245   109432189
3   225393  44283   16670948275 133.244525906   125115445
4   225393  44283   16670948275 122.852574096   135698811
5   225393  44283   16670948275 117.991702629   141289157
6   225393  44283   16670948275 131.906811125   126384287
7   225393  44283   16670948275 111.734195296   149201846
8   225393  44283   16670948275 112.290326643   148462906
9   225393  44283   16670948275 108.54295232    153588491
10  225393  44283   16670948275 106.306041287   156820328

ThomasWaldmann commented 6 years ago

@jkahrs how many disks in total?

fschulze commented 6 years ago

@jkahrs I'm still on sierra. I got the 512GB NVMe, which one do you have? I'm kinda underwhelmed of the performance of mine now.

jkahrs commented 6 years ago

@ThomasWaldmann updated comment @fschulze that's also an 512GB drive. You seem to have a lot more files and folders than me.

fschulze commented 6 years ago

Ahh, now that looks different:

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   7660    2428    11783418573 13.727851779    858358522
2   7660    2428    11783418573 9.892907606 1191097606
3   7660    2428    11783418573 7.561183771 1558409229
4   7660    2428    11783418573 6.532947498 1803690995
5   7660    2428    11783418573 6.051080952 1947324563
6   7660    2428    11783418573 5.801837963 2030980294
7   7660    2428    11783418573 5.720344899 2059914005
8   7660    2428    11783418573 5.488674043 2146860695
9   7660    2428    11783418573 5.602929183 2103081832
10  7660    2428    11783418573 5.485737988 2148009729

Fewer but bigger files now.

Interestingly, for my machine 5 threads is the sweet spot, even though it's a 4 core machine. Probably because traversal isn't multithreaded and fits into the "hyperthreads".

So I think this benchmark shows that we should read files with more threads, but speeding up traversal if possible would be a bigger win for frequent backups.

zcalusic commented 6 years ago

2 x HDD 7200rpm (RAID1/mirror), images...

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       10284   133     25842061595     197.603061023   130777638
2       10284   133     25842061595     113.138364694   228411128
3       10284   133     25842061595     129.485594142   199574800
4       10284   133     25842061595     130.49217482    198035335
5       10284   133     25842061595     130.815553928   197545787
6       10284   133     25842061595     132.337407454   195274050
7       10284   133     25842061595     133.104915206   194148063
8       10284   133     25842061595     133.876554421   193029031
9       10284   133     25842061595     133.90702043    192985113
10      10284   133     25842061595     135.983645633   190038011

Verdict: no parallelism at all, the disk head can only be in one place at a time. Ask it to do more at once, and you only get worse results.

NB: 2 heads in mirror, both can be used for reading simultaneously, thus optimal parallelism in this case = 2.

zcalusic commented 6 years ago

NVMe 256GB, /home, lots od small files

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       576739  127136  35025161895     105.954562112   330567756
2       576741  127136  35025467336     57.705307912    606971327
3       576741  127136  35025473059     44.181948115    792755289
4       576741  127136  35025484264     38.362061341    913024040
5       576741  127136  35025479521     35.911602085    975324894
6       576741  127136  35025482037     34.545554043    1013892612
7       576741  127136  35025482677     37.935898329    923280697
8       576741  127136  35025485464     33.846264666    1034840500
9       576741  127136  35025488407     33.539483021    1044306150
10      576741  127136  35025521527     33.082622413    1058728691

Verdict: it's a known fact that SSD storage has some internal parallelism, due to way it's built. Tests reveal that parallelism ~ 4 - 6 works best, and there's nothing to be gained above that (though, there's no slowdown either).

zcalusic commented 6 years ago

Finally, the most interesting case.

MooseFS distributed networked file system, consisting of 6 storage servers (in another country, 13ms away)

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   27  1   3235793142  262.119989982   12344701
2   27  1   3235793142  87.114840502    37143994
3   27  1   3235793142  86.260050541    37512071
4   27  1   3235793142  62.383132636    51869680
5   27  1   3235793142  54.943678061    58892910
6   27  1   3235793142  55.362500856    58447380
7   27  1   3235793142  48.599625005    66580619
8   27  1   3235793142  66.102590446    48951079
9   27  1   3235793142  45.514665676    71093417
10  27  1   3235793142  42.760927895    75671724

Verdict: of course, when we get to network latencies, parallelism starts to be very helpful. In this particular case, the bandwidth is shared with other network users (on both sides), so it's not easy to get stable results. Yet, in general, as parallelism goes up, so does the throughput. If network latency was even higher, or client had more bandwidth available, then even higher parallelism (> 10) would be useful.

jdchristensen commented 6 years ago

Two NVMe drives in software raid 1, /home, lots of small files.

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       100955  11789   11300634506     26.483996109    426696728
2       100955  11789   11300634506     16.512433857    684371220
3       100955  11789   11300634506     13.551913292    833877421
4       100955  11789   11300634506     12.153322282    929839120
5       100955  11789   11300634506     12.08701521     934940041
6       100955  11789   11300634506     11.974250177    943744646
7       100955  11789   11300634506     12.331832894    916379146
8       100955  11789   11300636242     11.390475056    992112812
9       100955  11789   11300636242     10.854376981    1041113300
10      100955  11789   11300636242     10.7035321      1055785710

jdchristensen commented 6 years ago

Since chunking, compressing, encrypting, etc will take time, will having multiple file traversal threads help much in practice? I guess it will help in the common case where very little has changed.

fd0 commented 6 years ago

Since chunking, compressing, encrypting, etc will take time, will having multiple file traversal threads help much in practice?

I think so, yes, if it's not so many threads all at once. Usually you have a pipeline to the individual stages (chunking, hashing for dedup, compression, archival) and keeping this pipeline well fed is important. Building a sample pipeline into the test program would be easy, do you think it's relevant to try that also?

fschulze commented 6 years ago

Interesting. I upgraded to High Sierra 10.13.2 and with many small files, it is now quite a bit slower with less then 4 threads and a good bit faster with 4 or more threads. For few bigger files the difference is within measuring margins I'd say.

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   4154598 752994  296045823508    1073.039318989  275894665
2   4154598 752994  296054272669    844.524821526   350557218
3   4154624 752997  296058951400    702.868742683   421215133
4   4154631 752999  296062264956    461.546142609   641457565
5   4154631 752999  296066096556    416.281698488   711215740
6   4154631 752999  296068853316    407.582615489   726402064
7   4154631 752999  296073022756    398.81490944    742382031
8   4154631 752999  296076032793    406.297594362   728717169
9   4154631 752999  296080745761    403.990764217   732889887
10  4154654 753000  296090027843    407.165517793   727198190

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   7660    2428    11783418573 13.745932444    857229483
2   7660    2428    11783418573 9.880593585 1192582052
3   7660    2428    11783418573 8.10193976  1454394740
4   7660    2428    11783418573 6.941451101 1697543986
5   7660    2428    11783418573 6.44924223  1827101255
6   7660    2428    11783418573 6.098571202 1932160531
7   7660    2428    11783418573 5.798642183 2032099619
8   7660    2428    11783418573 5.704184401 2065749938
9   7660    2428    11783418573 5.739902085 2052895397
10  7660    2428    11783418573 5.908465865 1994327942

jkahrs commented 6 years ago

@fschulze that's interesting. Is that encrypted APFS?

fschulze commented 6 years ago

@jkahrs both are full disk encryption, previously it was HFS+, now APFS. I wonder if the first mitigations for meltdown and spectre in 10.13.2 are causing some of the slowdowns with few threads due to switching between kernel and userland.

jkahrs commented 6 years ago

@fschulze I'd guess those changes came with https://support.apple.com/de-de/HT208331 wich would have included Sierra. Maybe my prior installation was just messed up in some way.

fschulze commented 6 years ago

@jkahrs good to know that those fixes seem to be included for El Capitan. I'm waiting for a new Mac mini to replace that box

chmduquesne commented 6 years ago

Laptop SSD (SATA 2.5 inch). Filesystem is ext4, lvm on luks.

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   405456  32240   337417596737    738.08331411    457153806
2   405456  32240   337417596737    845.344665685   399147957
3   405456  32240   337417596737    795.896249007   423946710
4   405456  32240   337417596737    760.561481232   443642762
5   405456  32240   337417596737    757.047620175   445701944
6   405456  32240   337417596737    756.25180312    446170964
7   405456  32240   337417596737    751.386773147   449059803
8   405456  32240   337417596737    747.206960948   451571805
9   405456  32240   337417596737    752.828025356   448200100
10  405456  32240   337417596737    751.141265579   449206576

What I find interesting is that for me, the performance is the worst with 2 workers. I did this test twice - the second time just after rebooting (kernel upgrade) and before I launched any program. The results were consistent.

hensoko commented 6 years ago

RaidZ2 here with 8x 10 TB HGST HUH721010ALN600

Pass 1:

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1   31254   5680    1653767866  16.95543838 97536131
2   31254   5680    1653767866  1.726086611 958102481
3   31254   5680    1653767866  1.382910139 1195860684
4   31254   5680    1653767866  1.24738435  1325788531
5   31254   5680    1653767866  1.16986355  1413641672
6   31254   5680    1653767866  1.178085722 1403775493
7   31254   5680    1653767866  1.222813327 1352428722
8   31254   5680    1653767866  1.182235734 1398847808
9   31254   5680    1653767866  1.156814964 1429587200
10  31254   5680    1653767866  1.212774252 1363623826

Pass 2:

1   31254   5680    1653767866  2.823326856 585751473
2   31254   5680    1653767866  1.707626776 968459788
3   31254   5680    1653767866  1.352236686 1222986983
4   31254   5680    1653767866  1.290057231 1281933720
5   31254   5680    1653767866  1.191252591 1388259617
6   31254   5680    1653767866  1.238523396 1335273819
7   31254   5680    1653767866  1.138185382 1452986387
8   31254   5680    1653767866  1.160864022 1424600844
9   31254   5680    1653767866  1.175741205 1406574728

Pass 3:

1   31254   5680    1653767866  2.846161947 581051920
2   31254   5680    1653767866  1.725233356 958576334
3   31254   5680    1653767866  1.406585421 1175732267
4   31254   5680    1653767866  1.289976325 1282014122
5   31254   5680    1653767866  1.222473042 1352805181
6   31254   5680    1653767866  1.168530526 1415254312
7   31254   5680    1653767866  1.22741548  1347357836
8   31254   5680    1653767866  1.137919145 1453326339
9   31254   5680    1653767866  1.210682503 1365979818
10  31254   5680    1653767866  1.152583339 1434835824

nealpalmer commented 6 years ago

NanoPi Neo Plus2 (gigabit ethernet) old sata HDD on windows with samba share.

workers files dirs bytes time (seconds) bandwidth (per second) 1 15433 50 99500766450 1390.829912695 71540571 2 15433 50 99500766450 1345.830763234 73932599 3 15433 50 99500766450 1288.995289797 77192498 4 15433 50 99500766450 1238.611462013 80332509

I'm much more interested in multiple directory scanning threads, or something else that will speed up incremental backups with no data change. Especially when the data resides on a Samba mount.

Be aware of any test that isn't significantly larger than DRAM, my initial 1GB runs were very well cached (and therefore completely useless). Only the single worker thread on those runs had the time.

viric commented 6 years ago

I get pretty bad results with NTFS on a USB3 spinning disk in a fast computer. All commands dropping caches (3) before their execution:

Benchmark:

$ for i in `seq 1 10`; do sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; ./prb --workers $i --output benchmark.csv /mnt/tmp/restic-tmp/data/43/; done

workers files   dirs    bytes   time (seconds)  bandwidth (per second)
1       703     1       3494579168      61.948200115    56411310
2       703     1       3494579168      113.038328638   30914993
3       703     1       3494579168      126.811221339   27557333
4       703     1       3494579168      135.484139419   25793271
5       703     1       3494579168      147.656243787   23666992
6       703     1       3494579168      169.612333934   20603331
7       703     1       3494579168      191.363288231   18261492
8       703     1       3494579168      207.893400994   16809476
9       703     1       3494579168      228.545733447   15290502
10      703     1       3494579168      247.179460685   14137821

$ cat /mnt/tmp/restic-tmp/data/43/*|dd bs=4k | sha256sum
28daaca6a51d6ad65a9fc496f52c993941f5269fa17ff09e615654b4e49a87af  -
852817+703 registres llegits
852817+703 registres escrits
3494579168 bytes (3,5 GB, 3,3 GiB) copied, 57,4947 s, 60,8 MB/s
# mount |grep sdc
/dev/sdc1 on /mnt/tmp type fuseblk (rw,noatime,user_id=0,group_id=0,allow_other,blksize=4096)
# dd if=/dev/sdc bs=4k count=500k of=/dev/null
512000+0 registres llegits
512000+0 registres escrits
2097152000 bytes (2,1 GB, 2,0 GiB) copied, 18,4992 s, 113 MB/s

I see the same performance in Windows (native NTFS), so I don't think it's much related to the FUSE ntfs-3g.

The "openssl speed sha256" single-thread test, to show that it is a fast computer:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           92166.97k   202457.75k   377680.73k   472391.34k   508515.67k   510503.59k

viric commented 6 years ago

USB hard disk drives, and also very cheap PCI controllers, do not have NCQ (https://en.wikipedia.org/wiki/Native_Command_Queuing) so that may be crucial in concurrent read performance on spinning disks. I say that because it may be common to backup to a USB spinning disk. @fd0

Moulick commented 3 years ago

Here is one for AWS EFS (Throughput mode Provisioned (500 MiB/s)) using a m5.4xlarge machine with AWS Linux 2 Linux <redacted> 4.14.232-177.418.amzn2.x86_64 #1 SMP Tue Jun 15 20:57:50 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

workers files   dirs    bytes           time (seconds)  bandwidth (per second)
1   14854   144 10909596899 129.796206979   84051738
2   14854   144 10909596899 71.923690731    151682940
3   14854   144 10909596899 53.633575573    203409837
4   14854   144 10909596899 44.813195277    243446083
5   14854   144 10909596899 39.832552075    273886465
6   14854   144 10909596899 36.292259379    300603960
7   14854   144 10909596899 34.163074778    319338846
8   14854   144 10909596899 32.337106256    337370846
9   14854   144 10909596899 31.127344157    350482740
10  14854   144 10909596899 29.691891475    367426807

borgbackup / borg

multithreading: input file discovery / reading parallelism #3500