Extreme performance issues with bladebit 2

Motophan commented 2 years ago

Re: https://chiaforum.com/t/extreme-performance-issues-with-bladebit-2-very-bad/17422

My specs: 5950x, 128gb ram 3600MHz (infinity fabric 1:1), samsung 980 pro 2TB nvme pcie 4.0

madmax 110G ram drive for temp2 takes 17-18 mins to create a k32 on my system.
bladebit, for phase 1, took almost an hour.

git clone https://github.com/Chia-Network/bladebit.git
cd bladebit
git checkout 2.0.0-alpha-2
mkdir build
cd build
cmake ..
cmake --build . --target bladebit --config Release
./bladebit --version

My plotting command is ./bladebit -f <farmer_public_key> -c <pool_contract_address> diskplot -a --cache 99G -t1 /mnt/nvme01 /mnt/nvme01/chiadone

Phase 1 Table 1-> a few seconds Table 2 -> a few seconds Table 3-7 -> more than 10 mins each table

CPU usage from htop -> nearly 100% all the time. One core is not maxed out, the IO core. All other cores 99-100% all the time. So its doing "something". Core 0 is the non-pinned core.

I did not bother continuing w/ phase 2 as there is clearly a severe issue w/ the plotter and needed to get back to meaningful plotting performance w/ madmax.

harold-b commented 2 years ago

Seems like a few people are having a similar issue when specifying a cache of 99G. My assumption here is that the system has started swapping heavily instead of using the actual pages requested or something similar.

Can you try with something like 90 - 94G of cache?

Motophan commented 2 years ago

my swap is off. i will retry and post results but I'm arch Linux headless. i have extremely low memory usage. no other programs (other than necessary daemons) are running besides bladebit.

edit: another user said to use -B ? is this also needed?

harold-b commented 2 years ago

-b (make sure it's lower case) sets your desired bucket size between 64-512 (inclusive) where 128 is currently disabled due to a Phase3 serialization bug with it. Lower buckets generally means better performance, but it might vary on occasion depending on the hardware combination.

Lower buckets requires more RAM for the heap (11.5G at 64 I believe) when a natural block size of 4096 is used. I don't recommend using larger block sizes, as it is currently largely untested and would require much more RAM since each bucket buffer is aligned to block size.

Motophan commented 2 years ago

Hi, I apologuise for the slow response. I tried 94G and 106G cache and it is the same. Ryzen 5950x

Table 3
 Sorting      : Completed in 26.14 seconds.
 Distribution : Completed in 8.35 seconds.
 Matching     : Completed in 15.92 seconds.
 Fx           : Completed in 22.00 seconds.
Completed table 3 in 672.53 seconds with 4294786571 entries.
Table 3 I/O wait time: 2.47 seconds.
 Table 3 I/O Metrics:
  Average read throughput 5552.34 MiB ( 5822.05 MB ) or 5.42 GiB ( 5.82 GB ).
  Total size read: 66300.83 MiB ( 69521.46 MB ) or 64.75 GiB ( 69.52 GB ).
  Total read commands: 196608.
  Average write throughput 219.91 MiB ( 230.60 MB ) or 0.21 GiB ( 0.23 GB ).
  Total size written: 145143.72 MiB ( 152194.22 MB ) or 141.74 GiB ( 152.19 GB ).
  Total write commands: 262402.

All tables 3-7 are like this. My nvme drive is xfs formatted, I ran fstrim before this command, and its a 980 pro 2TB samsung with pcie 4.0 x4 lane. Its very fast.

sudo hdparm -t /dev/nvme1n1
[sudo] password for plotter: 

/dev/nvme1n1:
 Timing buffered disk reads: 6472 MB in  3.00 seconds = 2156.54 MB/sec

This speedtest for the drive was while bladebit is running

This is after running BB and fstrim

sudo hdparm -t /dev/nvme1n1

/dev/nvme1n1:
 Timing buffered disk reads: 11662 MB in  3.00 seconds = 3886.81 MB/sec

My PC idles with 534MB ram usage (minimal service arch linux distro)

I tried the precompiled binary as a additional test, as maybe the compiled version compiled inefficiently on my system. No change.

Motophan commented 2 years ago

Hi, I still want to leave this issue open because I have a development suggestion. My cpu really really likes -b 64 and this dramatically reduced plotting speed. --cache XXG changes did nothing. If the program had some ability to benchmark diffrent -b values with some kind of command such as ./bladebit --benchmark this could reduce user confusion. Or maybe whenver it detects a ryzen platform set the -b 64 ?

Either way BB does a wonderful job with this flag Finished plotting in 1357.56 seconds ( 22.6 minutes ).

Attached a full copy of the plotting. Above you can see that plot times have been reduced from 100+ minutes to 22.6 This is still a bit slower than madmax (MM 110G ram disc is 17-18mins) on my machine but I have enough free ram to be able to use the PC at the same time the plotter is running. https://pastebin.com/7pe0pawp

Hcerit07 commented 2 years ago

hello i have a problem

version: ./bladebit --version 2.0.0-beta1

Increasing the file limit from 1024 to 1048576

Fatal Error: Unexpected argument '-b'.

can help ? "-b"

harold-b commented 2 years ago

I assume you are using the -b parameter before the diskplot command. Please open a query in discussions here and paste the command line you used that generated that error.

Hcerit07 commented 2 years ago

./bladebit -t 14 -c < > -f < > diskplot -a --cache 99G -t1 /mnt/temp/ /mnt/pw/ -b 64

Fatal Error: Unexpected argument '-b'.

harold-b commented 2 years ago

You need to place your final output directory as the last argument (/mnt/pw/ should go after -b 64). Please open a new discussion thread as mentioned above if you need further help with this, so that this thread doesn't get derailed any further :) (I won't reply here again in order to not derail this thread any further).

Chia-Network / bladebit

Extreme performance issues with bladebit 2 #202