Closed XCHSystems closed 1 year ago
Looking at line 12 of CudaPlotter.h
uint32 deviceIndex = 0; // Which CUDA device to use when plotting//
So is it possible to have this as a parsable option on the command line, but maybe to also allow multiple devices as well as specifying a device?
You can already pass the device index as a prameter to the cudaplot command:
bladebit_cuda -f .. -c … cudaplot -d 1 …
Multiple GPU is already planned but there are a number of other tasks ahead of it
You can already pass the device index as a prameter to the cudaplot command:
bladebit_cuda -f .. -c … cudaplot -d 1 …
Hi Harold
I have already tried the -d 1 option
./bladebit/bladebit_cuda -t 1 -n 1 -f 8f6986edcaa42b3f9ab1abd27df7f2224149787414564629f39f8ceada85bf3abd7dd899296e2d0a9a138875191dd5ab -c xch1jlje9r7ndepgt3rrm4w7taayn0d6yh5654wwv4msx2226z7rx8as2puwzq cudaplot -d 1 /Plotdisks/RAID/
Bladebit Chia Plotter
Version : 3.0.0-alpha1
Git Commit : f269db0a7ad307514e993c335897cea7ebf46eda
Compiled With: gcc 9.4.0
[Global Plotting Config]
Will create 1 plots.
Thread count : 1
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : 8f6986edcaa42b3f9ab1abd27df7f2224149787414564629f39f8ceada85bf3abd7dd899296e2d0a9a138875191dd5ab
Pool contract address : xch1jlje9r7ndepgt3rrm4w7taayn0d6yh5654wwv4msx2226z7rx8as2puwzq
Benchmark mode : disabled
[Bladebit CUDA Plotter]
Selected cuda device 0 : NVIDIA RTX A4000
CUDA Compute Capability : 8.6
SM count : 48
Max blocks per SM : 16
Max threads per SM : 1536
Async Engine Count : 2
L2 cache size : 4.00 MB
L2 persist cache max size : 3.00 MB
Stack Size : 1.00 KB
Memory:
Total : 15.73 GB
Free : 9.21 GB
As you can see it still uses device 0
And you can see from the following output from nvidia-smi, device 0 is being used already by bladebit_cuda
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 On | 00000000:51:00.0 Off | Off |
|100% 57C P2 132W / 140W | 6516MiB / 16376MiB | 85% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A4000 On | 00000000:8A:00.0 Off | Off |
|100% 26C P8 16W / 140W | 12MiB / 16376MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3628 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 3682 G /usr/bin/gnome-shell 4MiB |
| 0 N/A N/A 5954 C ...ng/bladebit/bladebit_cuda 6016MiB |
| 1 N/A N/A 3628 G /usr/lib/xorg/Xorg 8MiB |
+-----------------------------------------------------------------------------+
Multiple GPU is already planned but there are a number of other tasks ahead of it
Happy to do some testing on that when you are ready, JM knows me :-)
@harold-b
It seems that the -d or --device option is just ignored by the command, it is not throwing an error like if I do a -D instead of a -d, but it seems that the -d 1 or --device 1 is just simply ignored
I looked over the relevant code over the weekend, and it certainly should be using the parameter, unless I missed something (which is likely). Could you please share a full log with the -d
parameter used with something other than 0
?
I looked over the relevant code over the weekend, and it certainly should be using the parameter, unless I missed something (which is likely). Could you please share a full log with the
-d
parameter used with something other than0
?
@harold-b When you say full log, what are you asking for? My currently running plot process has -d 1 defined but obviously it is using GPU 0 still, so let me know what you need
@harold-b
Could it be this line in CudaPlotter.h
uint32 deviceIndex = 0; // Which CUDA device to use when plotting/
@harold-b By modifying that value to 1, and re-compiling, I can now run two instances of bladebit_cuda each to a different GPU. So it could be that the setting there is over-riding the -d or --device
@harold-b
Could it be this line in CudaPlotter.h
uint32 deviceIndex = 0; // Which CUDA device to use when plotting/
That is just the default value. The value gets parsed from CLI here: https://github.com/Chia-Network/bladebit/blob/cuda-compression/cuda/CudaPlotter.cu#L69
But I did find the issue. Device initialization is done before the config is assigned to the context. So I just need to swap-out a couple of lines
Fixed in 221fb883990dba6f0d12a9dbdd7de711de41f174
@FlexiMiners If you get a chance to test that commit, please let me know if it worked for you
@harold-b As soon as my current dual plotting process run completes, I will compile and run and let you know
@harold-b Yes that works, thank you, now all we need is multi GPU so that it's not consuming twice as much RAM 👍
@XCHSystems What do you mean it's not consuming twice as much RAM
? We will configure our plotting machines with two GPUs but only 256G RAM, could it run two instances of plotter? Thanks!
@XCHSystems What do you mean
it's not consuming twice as much RAM
? We will configure our plotting machines with two GPUs but only 256G RAM, could it run two instances of plotter? Thanks!
You need 256GB of RAM per bladebit_cuda instance. So in order to use two GPUs you need to run 2x bladebit_cuda
@XCHSystems So, if my machine is limited by a maximum of 256GB of RAM, will I only be able to use a single GPU? If bladebit_cuda is able to split the computation into two GPUs to plot faster, that will also be good for me.
@XCHSystems So, if my machine is limited by a maximum of 256GB of RAM, will I only be able to use a single GPU? If bladebit_cuda is able to split the computation into two GPUs to plot faster, that will also be good for me.
Correct, I know Harold is working on a Multi GPU bladebit_cuda which will only require a single bladebit_cuda instance, but I do not think that will be relatively soon.
We plot with multiple GPU by running multiple Instances, in one system we have 1TB RAM so we are utilising 4x GPU for plotting (4xA4000) which generates four plots every two minutes.
Will a future version support multiple GPUs? Will there be an option to specify the device in order to run multiple instances of bladebit_cuda against different GPUs?
I have two A4000 GPUs and bladebit_cuda only uses device 0, is there a special build / hidden command to utilise more than one GPU?
Simon