Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
337 stars 108 forks source link

win 10 wsl bladebit_cuda "cudaErrorMemoryAllocation : out of memory" although card has enough memory #444

Open brause opened 10 months ago

brause commented 10 months ago

Hi there,

I am trying to make some --disk-16 cuda plots with bladebit_cuda under wsl but it seems my GPU ram is failing. Since I have no idea what to do anymore about this error I am posting this here. Output from the bladebit_cuda and nvidia-smi.exe is included below.

What I tried: I tried reinstalling the newest nvidia driver and installing and older(546.01) version of the nvidia driver. Same result. I checked the card if it is a fake(has no vga port and all drivers install fine) but it seems a genuine nvidia card.

Any more checks on the card i can do ?

Gruß, Karsten

bruch@Himbeer:/mnt/j/bladebit/build-release$ ./bladebit_cuda -f 8025cdb69d131cee2264785bd9e3ff7c5f7eceeb855951bcb2e471776e7fd59a0c4bdc87a659d8fc88bd35a0ee4179b2 -p 98c3089ecadcebec5b6e7ec9e8652f87e923a065856403542afd0902802c5733f0005dba963c09b101aa028ba28d2b89 --compress
 5 --benchmark cudaplot --disk-16 -t1 /mnt/h/tmp/ /mnt/j/farm/

Bladebit Chia Plotter
Version      : 3.1.0-dev
Git Commit   : e9836f8bd963321457bc86eb5d61344bfb76dcf0
Compiled With: gcc 11.4.0

[Global Plotting Config]
 Will create 1 plots.
 Thread count          : 16
 Warm start enabled    : false
 NUMA disabled         : false
 CPU affinity disabled : false
 Farmer public key     : 8025cdb69d131cee2264785bd9e3ff7c5f7eceeb855951bcb2e471776e7fd59a0c4bdc87a659d8fc88bd35a0ee4179b2
 Pool public key       : 98c3089ecadcebec5b6e7ec9e8652f87e923a065856403542afd0902802c5733f0005dba963c09b101aa028ba28d2b89
 Compression Level     : 5
 Benchmark mode        : enabled
Warning: 16G mode is experimental and still under development.
         Please use the --check <n> parameter to validate plots when using this mode.
         Direct I/O not supported in 16G mode at the moment. Disabing it.

[Bladebit CUDA Plotter]
 Host RAM            : 19 GiB
 Plot checks         : disabled

Selected cuda device 0 : NVIDIA GeForce GTX 1070
 CUDA Compute Capability   : 6.1
 SM count                  : 15
 Max blocks per SM         : 32
 Max threads per SM        : 2048
 Async Engine Count        : 5
 L2 cache size             : 2.00 MB
 L2 persist cache max size : 0.00 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 8.00 GB
  Free                     : 7.06 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 4828776144   bytes ( 4605.08   MiB or 4.50   GiB )
Intermediate RAM required : 4378927104   bytes ( 4176.07   MiB or 4.08   GiB )
Host RAM required         : 2147483648   bytes ( 2048.00   MiB or 2.00   GiB )
Total Host RAM required   : 6976259792   bytes ( 6653.08   MiB or 6.50   GiB )
GPU RAM required          : 6163050496   bytes ( 5877.54   MiB or 5.74   GiB )
Allocating buffers...
CUDA error: 2 (0x2 ) cudaErrorMemoryAllocation : out of memory

*** Panic!!! *** Fatal Error:
CUDA error cudaErrorMemoryAllocation : out of memory.
./bladebit_cuda(_ZN7SysHost14DumpStackTraceEv+0x53)[0x56131c8fad93]
./bladebit_cuda(_Z9PanicExitv+0xf)[0x56131ca8c27f]
./bladebit_cuda(+0x7dbaf)[0x56131c8a1baf]
./bladebit_cuda(+0x85f70)[0x56131c8a9f70]
./bladebit_cuda(main+0xa61)[0x56131c89f121]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f029c285d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f029c285e40]
./bladebit_cuda(_start+0x25)[0x56131c8a09a5]
bruch@Himbeer:/mnt/j/bladebit/build-release$ nvidia-smi.exe
Fri Dec  1 17:46:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.01                 Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070      WDDM  | 00000000:23:00.0  On |                  N/A |
|  0%   47C    P2              28W / 151W |    468MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1584    C+G   C:\Windows\System32\dwm.exe               N/A      |
|    0   N/A  N/A     11568    C+G   ...oogle\Chrome\Application\chrome.exe    N/A      |
|    0   N/A  N/A     11720    C+G   C:\Windows\explorer.exe                   N/A      |
|    0   N/A  N/A     13096    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     14188    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe    N/A      |
|    0   N/A  N/A     17016    C+G   ...GeForce Experience\NVIDIA Share.exe    N/A      |
|    0   N/A  N/A     18572    C+G   ...5n1h2txyewy\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     19920    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     24264    C+G   ...siveControlPanel\SystemSettings.exe    N/A      |
+---------------------------------------------------------------------------------------+
teamwest93 commented 10 months ago

Had this problem too when started. But it resolved by itself - just gone.

brause commented 10 months ago

did you use wsl-ubuntu as well ?

teamwest93 commented 10 months ago

Yes. I tried few guides, but i dont remember which one help - Nvidia for WSL, or Ubuntu for Nvidia. https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local

teamwest93 commented 10 months ago

so, it helps?

brause commented 10 months ago

sadly, no. I still got:

Selected cuda device 0 : NVIDIA GeForce GTX 1070
 CUDA Compute Capability   : 6.1
 SM count                  : 15
 Max blocks per SM         : 32
 Max threads per SM        : 2048
 Async Engine Count        : 5
 L2 cache size             : 2.00 MB
 L2 persist cache max size : 0.00 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 8.00 GB
  Free                     : 7.06 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 4828776144   bytes ( 4605.08   MiB or 4.50   GiB )
Intermediate RAM required : 4378927104   bytes ( 4176.07   MiB or 4.08   GiB )
Host RAM required         : 2147483648   bytes ( 2048.00   MiB or 2.00   GiB )
Total Host RAM required   : 6976259792   bytes ( 6653.08   MiB or 6.50   GiB )
GPU RAM required          : 6163050496   bytes ( 5877.54   MiB or 5.74   GiB )
Allocating buffers...
CUDA error: 2 (0x2 ) cudaErrorMemoryAllocation : out of memory

*** Panic!!! *** Fatal Error:
CUDA error cudaErrorMemoryAllocation : out of memory.
./bladebit_cuda(_ZN7SysHost14DumpStackTraceEv+0x53)[0x562451e7fd93]
./bladebit_cuda(_Z9PanicExitv+0xf)[0x56245201127f]
./bladebit_cuda(+0x7dbaf)[0x562451e26baf]
./bladebit_cuda(+0x85f70)[0x562451e2ef70]
./bladebit_cuda(main+0xa61)[0x562451e24121]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fc44c480d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc44c480e40]
./bladebit_cuda(_start+0x25)[0x562451e259a5]
brause commented 10 months ago

what does you nvidia-smi.exe say inside wsl ? I think I could try to replicate that.

teamwest93 commented 10 months ago

Sun Dec 3 19:12:57 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.04 Driver Version: 546.17 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+====================================| | 0 NVIDIA GeForce RTX 2070 ... On | 00000000:01:00.0 On | N/A | | N/A 50C P8 7W / 80W | 231MiB / 8192MiB | 27% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

brause commented 10 months ago

You are on win11 ? It could just simply not work in win 10. Edit: i tried downgrading the drivers. Did not work. Same Problem.

teamwest93 commented 10 months ago

Win 11, yes

brause commented 10 months ago

I guess that could be it. Thank you.