Closed kybash closed 1 year ago
Running into same issue on Ubuntu 22.04 and a 3060 Ti. Previous alpha build worked.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05 Driver Version: 525.85.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
Bladebit Chia Plotter
Version : 3.0.0-alpha3
Git Commit : eb6df030b555fb35addc3d6762424d52826a5d82
Compiled With: gcc 9.4.0
[Global Plotting Config]
Will create 10 plots.
Thread count : 48
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : [removed]
Pool contract address : [removed]
Compression Level : 7
Benchmark mode : disabled
[Bladebit CUDA Plotter]
Selected cuda device 0 : NVIDIA GeForce RTX 3060 Ti
CUDA Compute Capability : 8.6
SM count : 38
Max blocks per SM : 16
Max threads per SM : 1536
Async Engine Count : 2
L2 cache size : 3.00 MB
L2 persist cache max size : 2.25 MB
Stack Size : 1.00 KB
Memory:
Total : 7.79 GB
Free : 7.65 GB
Allocating buffers (this may take a few seconds)...
Kernel RAM required : 90240524288 bytes ( 86060.07 MiB or 84.04 GiB )
Intermediate RAM required : 2999001088 bytes ( 2860.07 MiB or 2.79 GiB )
Host RAM required : 141733920768 bytes ( 135168.00 MiB or 132.00 GiB )
Total Host RAM required : 231974445056 bytes ( 221228.07 MiB or 216.04 GiB )
GPU RAM required : 5862256640 bytes ( 5590.68 MiB or 5.46 GiB )
Allocating buffers
CUDA error: 13 (0xd ) cudaErrorInvalidSymbol : invalid device symbol
*** Panic!!! *** Fatal Error:
CUDA error cudaErrorInvalidSymbol : invalid device symbol.
./bladebit_cuda(+0xe175b)[0x55e3d694475b]
./bladebit_cuda(+0xe0f3f)[0x55e3d6943f3f]
./bladebit_cuda(+0x41c7a)[0x55e3d68a4c7a]
./bladebit_cuda(+0x1bd3c)[0x55e3d687ed3c]
./bladebit_cuda(+0x180c7)[0x55e3d687b0c7]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f6995629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f6995629e40]
./bladebit_cuda(+0x1984e)[0x55e3d687c84e]
cuda version should be the same, gpu driver = system
See if you are able to get better results with this build (artifacts at the bottom of the page): https://github.com/Chia-Network/bladebit/actions/runs/4388769746
See if you are able to get better results with this build (artifacts at the bottom of the page): https://github.com/Chia-Network/bladebit/actions/runs/4388769746
Same CUDA error with this build.
CUDA error: 13 (0xd ) cudaErrorInvalidSymbol : invalid device symbol
*** Panic!!! *** Fatal Error:
CUDA error cudaErrorInvalidSymbol : invalid device symbol.
./bladebit_cuda(+0xe175b)[0x55aa45e2e75b]
./bladebit_cuda(+0xe0f3f)[0x55aa45e2df3f]
./bladebit_cuda(+0x41c7a)[0x55aa45d8ec7a]
./bladebit_cuda(+0x1bd3c)[0x55aa45d68d3c]
./bladebit_cuda(+0x180c7)[0x55aa45d650c7]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f42f2a29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f42f2a29e40]
./bladebit_cuda(+0x1984e)[0x55aa45d6684e]
However, reverting back to the build below with the otherwise same environment produces plots as expected.
Version : 3.0.0-alpha1
Git Commit : f269db0a7ad307514e993c335897cea7ebf46eda
Compiled With: gcc 9.4.0
Seemingly an architecture match issue. It seems some GPUs are not happy with multiple code images stored on the executable. The old build only had an image an image for 5_2
. This one includes it, but the is likely taking the one that matches its model exactly and for some reason not working.
Some people have worked around this by upgrading to the latest driver.
I will try that and report back.
Resolved by removing all nvidia packages and then installing CUDA 12.1
, which is a bump up from CUDA 12.0
included with the nvidia display drivers.
Fixed for me also, by removing the distro standard (RPMfusion) drivers/cuda and using nVidia's CUDA binaries from https://developer.nvidia.com/cuda-downloads
Ended up with
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
Testing the pre-compiled bladebit-cuda-v3.0.0-alpha2-centos binary throws this error:
OS= fedora37 with 3070 and these drivers/cuda:
The alpha build is working fine on the same system.