airdcpp-web / airdcpp-webclient

Communal peer-to-peer file sharing application for file servers/NAS devices
https://airdcpp-web.github.io
171 stars 31 forks source link

Hashing Speed In The Low 20s #407

Closed denppa closed 2 years ago

denppa commented 2 years ago

I have a few network drives mounted.

Tested the speed of reading files from them with dd if=file of=/dev/null bs=4k The speed was 90MiB/s .

Then went ahead and checked the nethogs program, showing that rclone is running with around 150MB/s at a constant rate.

Yet airdc is hashing at 20MiB/s with fluctuations to kBs every other 10 second. Sometimes it boosts up to 60 MiB, albeit still lower than test speeds, it never lasts more than 1 minute. Then it drops back down to unstable 20s. Happens more often if I refresh the drive after a restart.

Directory I am trying to hash is 30 TBs.

I have also tried to run more hashes of directory in parallel. Thus utilizing two threads, but still only just got to a peak at 30MB, and the peaks are more like here and there, numbers still drop back down to 20s and even kBs.

I have removed all extensions except advanced sharing. Mounted location is read only with umask 002 and no permission for modtime changes.

Using the latest linux binary available. OS, ubuntu 20.04. Hetzner cloud, google drives with rclone.

Knowing devs are busy, any idea from anyone is welcomed!

maksis commented 2 years ago

Could you test the speed with version 2.10.0? Note that the older version uses a different format for web server settings so it might be better not to use your regular settings directory with it.

maksis commented 2 years ago

Link to older builds: http://web-builds.airdcpp.net/stable/

denppa commented 2 years ago

Link to older builds: http://web-builds.airdcpp.net/stable/

Exactly what I needed. So far I have two suspicions:

  1. New update on 2.11.0 has a problem.

  2. Or just in general large directories are bad.

Will give it a test ASAP!

Do you in any chance has pre-compiled binary available? Building it is a bit of a hassle tbh on ubuntu.

maksis commented 2 years ago

Do you in any chance has pre-compiled binary available? Building it is a bit of a hassle tbh on ubuntu.

Did you try the binaries from the link that I posted?

denppa commented 2 years ago

Do you in any chance has pre-compiled binary available? Building it is a bit of a hassle tbh on ubuntu.

Did you try the binaries from the link that I posted?

So I was working on it, sorry for not updating. Yes, after tar xf it is indeed a binary.

So as I was hashing the directory with airdc 2.10.0, it still throttles at 20MiB/s

Then I tried to make it hash smaller directories on the same rclone mount, and oh man it was fast, sitting at around 60MiB/s which is quite satisfactory for me.

So I think it is safe to say this is a problem with airdc not being good on larger directories after a certain point. What do you say?

denppa commented 2 years ago

So sustained hashing in 2.11.1 is slowing after initial bursts, lemme try 2.10.0 with the same directory to see if we are getting similar results.

EDIT1: so far so good: https://i.imgur.com/QHpW4Y8.png version 2.10.0

maksis commented 2 years ago

Can you also confirm that version 2.10.1 behaves similar to version 2.11.1?

maksis commented 2 years ago

And could you also test the slow version with more hashing threads (e.g. 10-20 threads)?

denppa commented 2 years ago

And could you also test the slow version with more hashing threads (e.g. 10-20 threads)?

Okay so sorry because I have to give it some time to stabilize I will give results a bit later.

But how do I select more threads? Currently I have 3 cores, so I set it to be maxed out at three, but each directory is only given 1 thread by default, I don't know how do increase thread count.

maksis commented 2 years ago

Currently I have 3 cores, so I set it to be maxed out at three, but each directory is only given 1 thread by default, I don't know how do increase thread count.

You increased the maximum number of per-volume hashers too, right? Note that it won't increase the thread count for the current hashing queue so you need to stop hashing and refresh the directories again (or just restart the client).

denppa commented 2 years ago

Alright so first update, as I am feeling confident about is sustained performance now.

Server: hetzner 3 AMD cores 4GB ram, 8GB swap

speedtest yielded 250MiB/s downloads, which is what matters here for network mounted drives.

airdc version 2.10.0, max hash thread 3, mas hash thread per volume (settings under max hash threads in sharing settings for those who don't know) is set to 20.

~180GB directory for 10 minutes, with occasional dips to the single digits being within acceptable margins.

Peaking at 70MiB/s and stable at 50MiB/s. Right now seeing 60s.

73MiB/s moment with 3 threads htop https://i.imgur.com/ulzpwaE.png

nethogs showing 71576 KB/s for highs.

At the time this was finished typing, I have hashed ~100GBs and with 80 left. Still looking good.

What is bugging me, why is a single thread able to hash so quick at 60MiB/s before it is dropped into the low twenties? Cloud VM weirdness? But with all three threads combined, the numbers I am seeing adds up to 3 cords working together at 20x3 = 60

maksis commented 2 years ago

You also need to increase the total number of hashing threads from 3 (the total number of hashing threads shouldn't be less than the number of per-volume hashing threads)

denppa commented 2 years ago

So I don't know how setting 10 threads max and 10 threads per volume would help since I have only got 3 cores, but it did.

Parallel transfers up to 5 at a time. With the VPS' internet speed I am now at 200MiB/s for some reason. Miracle!

Can author explain why that is? That I can utilize more "threads" than the cores present on htop/hetzner's vps plan?

This is truly amazing, some stats:

airdc 2.11.1 dir size: ~500GB speeds, from 100 sky rocketing to 200 stable, right now at 250MiB/s rclone parallel transfers 5 (I might be able to get it to go higher for even more performance?) My box's max speed is like 700MB/s with speedtest no joke.

maksis commented 2 years ago

The number of hashing threads doesn't have much to do with the number of CPU cores, it's about the number of files that are being read in parallel. Hashing typically isn't CPU-bound, unless you have fast storage and slow CPU (which isn't the case while hashing 20MB/s).

Since you have a special storage solution that downloads files over the internet, it obviously benefits from increasing the number of parallel files to read. I'm not that familiar with rclone, but quite likely the previous drops in your hashing speed happened while the files were being downloaded (and the hasher thread was effectively idling during that time as it had no data to read). Having more threads should reduce the amount of idle time as there are more simultaneous downloads going on, feeding the application more evenly with data to read.

denppa commented 2 years ago

So this issue is now good to be closed with the following conclusion.

First I must give my thanks to the active dev maksis for responding! Without your knowledge I will be hashing content till the end of the world.

Now, for those who are struggling with slow hashing speeds, know the following.

The best results I have observed with using a cloud mounting service like rclone is to have 15 threads. Parallel hashing at 15 max and 15 per volume gave me the best results at 300MiB/s peak.

It is truly amazing for me to able to read files at that speed.

Now, I tried to set it to 20 parallels, but the performance never reached that high anymore. Stabilizes at 130MiB/s or so instead.