Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
340 stars 107 forks source link

cuda plot Error STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered. #441

Open Perk-Mew opened 7 months ago

Perk-Mew commented 7 months ago

need some help I try to plot with chia Gui 2.1.1 My system is AMD Threadripper 3970x 256 gb of ram and rog rtx 3070 8gb

Bladebit Chia Plotter Version : 3.1.0 Git Commit : e9836f8bd963321457bc86eb5d61344bfb76dcf0 Compiled With: msvc 19.29.30152

[Global Plotting Config] Will create 1 plots. Thread count : 32 Warm start enabled : false NUMA disabled : false CPU affinity disabled : false Farmer public key : b5a8672980142bb8f3b51293b5252f739de0c124db9c3d4b93384d14775de4f0c80b568105f72d022ee3170ec8a5b41e Pool contract address : xch1nkntyptsljk8t7n3j2j5fa6hw5p28ht85ckspdf59qhvxqr26mfsndj6yv Compression Level : 7 Benchmark mode : disabled

[Bladebit CUDA Plotter] Host RAM : 255 GiB Plot checks : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 3070 CUDA Compute Capability : 8.6 SM count : 46 Max blocks per SM : 16 Max threads per SM : 1536 Async Engine Count : 1 L2 cache size : 4.00 MB L2 persist cache max size : 3.00 MB Stack Size : 1.00 KB Memory: Total : 8.00 GB Free : 6.95 GB

Allocating buffers (this may take a few seconds)... Kernel RAM required : 91955994624 bytes ( 87696.07 MiB or 85.64 GiB ) Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB ) Host RAM required : 142270791680 bytes ( 135680.00 MiB or 132.50 GiB ) Total Host RAM required : 234226786304 bytes ( 223376.07 MiB or 218.14 GiB ) GPU RAM required : 6163857408 bytes ( 5878.31 MiB or 5.74 GiB ) Allocating buffers... Done.

Generating plot 1 / 1: 5f48a25977249f0319bf1ef50234b4178a6530b80deb8df735a842ccbc31bc6b Plot temporary file: A:\plot-k32-c07-2023-11-19-22-57-5f48a25977249f0319bf1ef50234b4178a6530b80deb8df735a842ccbc31bc6b.plot.tmp

Generating F1 Progress update: 0.01 Finished F1 in 4.37 seconds. Progress update: 0.1 Table 2 completed in 14.44 seconds with 4294920960 entries. Progress update: 0.2 Table 3 completed in 25.13 seconds with 4294908477 entries. Progress update: 0.3 Table 4 completed in 29.95 seconds with 4294902283 entries. Progress update: 0.4 Table 5 completed in 28.91 seconds with 4294888542 entries. Progress update: 0.5 Table 6 completed in 24.22 seconds with 4294849124 entries. Progress update: 0.6 Table 7 completed in 21.40 seconds with 4294750028 entries. Progress update: 0.7 Finalizing Table 7 STDERR: CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

STDERR:

STDERR: Panic!!! Fatal Error:

STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.

0x00007FF6E3B793E2 @ :: 0x00007FF6E3C7ED79 @ :: 0x00007FF6E3C9CE0D @ :: 0x00007FF6E3CC23FA @ :: 0x00007FF6E3CC1FD4 @ :: 0x00007FF6E3C9DCE8 @ :: 0x00007FF6E3C9EE51 @ :: 0x00007FF6E3C9FFDF @ :: 0x00007FF6E3CA06BF @ :: 0x00007FF6E3B5F0A8 @ :: 0x00007FF6E3D0AFEC @ :: 0x00007FF9DA98257D @ ::BaseThreadInitThunk() 0x00007FF9DCA8AA58 @ ::RtlUserThreadStart()

sobertram commented 7 months ago

Curious what is your nvidia-smi.exe output?

Perk-Mew commented 7 months ago

what is nvidia-smi.exe output? Driver?

sobertram commented 7 months ago

what is nvidia-smi.exe output? Driver?

It is a program that comes with your driver installation.

e.g.

Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

Try the new cross-platform PowerShell https://aka.ms/pscore6

PS C:\Users\sober> nvidia-smi.exe
Mon Nov 20 07:42:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.13                 Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P4                     TCC   | 00000000:02:00.0 Off |                    0 |
| N/A   63C    P0              25W /  75W |    546MiB /  7680MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     16724      C   ...unpacked\daemon\start_harvester.exe      538MiB |
+---------------------------------------------------------------------------------------+

Want to see how much memory is being used and by what programs.

I am assuming you are on windows if on linux then its just nvidia-smi.

Perk-Mew commented 7 months ago

yes windows 11 . how could i fix this problem.

sobertram commented 7 months ago

yes windows 11 . how could i fix this problem.

Can you share the output, like i did, of the nvidia-smi.exe on your system? It will show what programs are using memory. The error you are getting usually means the GPU has maxed out it's memory.

Also are you getting this error on every plot?

Perk-Mew commented 7 months ago

yes even i try to plot by using hybrid mode it's still error how to see the output like you did?

Perk-Mew commented 7 months ago

![Uploading Screenshot 2023-11-20 23085411.png…]()

Perk-Mew commented 7 months ago

Screenshot 2023-11-20 231129

Perk-Mew commented 7 months ago

it's show n/a

sobertram commented 7 months ago

it's show n/a

Right but we can also see 1096MiB / 8192MiB so looks like it is accurately, in bladebit, reflecting your free mem. I had some issues with cuda 12.3 on linux you may want to try 12.2 and see if that is more compatible.

So re-install the nvidia driver but install cuda 12.2.

I found the stable version for your card. But unlike unix, can't select the cuda version at download so not sure what version willbe installed. Hope this works for you. https://us.download.nvidia.com/Windows/546.01/546.01-desktop-win10-win11-64bit-international-nsd-dch-whql.exe

Perk-Mew commented 7 months ago

Bladebit Chia Plotter Version : 3.1.0 Git Commit : e9836f8bd963321457bc86eb5d61344bfb76dcf0 Compiled With: msvc 19.29.30152

[Global Plotting Config] Will create 1 plots. Thread count : 32 Warm start enabled : false NUMA disabled : false CPU affinity disabled : false Farmer public key : b5a8672980142bb8f3b51293b5252f739de0c124db9c3d4b93384d14775de4f0c80b568105f72d022ee3170ec8a5b41e Pool contract address : xch1nkntyptsljk8t7n3j2j5fa6hw5p28ht85ckspdf59qhvxqr26mfsndj6yv Compression Level : 7 Benchmark mode : disabled

[Bladebit CUDA Plotter] Host RAM : 255 GiB Plot checks : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 3070 CUDA Compute Capability : 8.6 SM count : 46 Max blocks per SM : 16 Max threads per SM : 1536 Async Engine Count : 1 L2 cache size : 4.00 MB L2 persist cache max size : 3.00 MB Stack Size : 1.00 KB Memory: Total : 8.00 GB Free : 6.95 GB

Allocating buffers (this may take a few seconds)... Kernel RAM required : 91955994624 bytes ( 87696.07 MiB or 85.64 GiB ) Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB ) Host RAM required : 142270791680 bytes ( 135680.00 MiB or 132.50 GiB ) Total Host RAM required : 234226786304 bytes ( 223376.07 MiB or 218.14 GiB ) GPU RAM required : 6163857408 bytes ( 5878.31 MiB or 5.74 GiB ) Allocating buffers... Done.

Generating plot 1 / 1: a2fd8774ceb12525f7abcf4b701c0857162f2db2d89a554cf87e5e223ab4a014 Plot temporary file: A:\plot-k32-c07-2023-11-21-00-19-a2fd8774ceb12525f7abcf4b701c0857162f2db2d89a554cf87e5e223ab4a014.plot.tmp

Generating F1 Progress update: 0.01 Finished F1 in 4.79 seconds. Progress update: 0.1 Table 2 completed in 14.98 seconds with 4294960998 entries. Progress update: 0.2 Table 3 completed in 27.71 seconds with 4294887329 entries. Progress update: 0.3 Table 4 completed in 33.68 seconds with 4294790221 entries. Progress update: 0.4 Table 5 completed in 29.64 seconds with 4294597227 entries. Progress update: 0.5 Table 6 completed in 25.63 seconds with 4294168017 entries. Progress update: 0.6 Table 7 completed in 19.79 seconds with 4293380871 entries. Progress update: 0.7 Finalizing Table 7 Finalized Table 7 in 9.58 seconds. Completed Phase 1 in 166.15 seconds Progress update: 0.8 Marked Table 6 in 3.56 seconds. Marked Table 5 in 3.25 seconds. Marked Table 4 in 3.58 seconds. Marked Table 3 in 3.48 seconds. Completed Phase 2 in 13.88 seconds Progress update: 0.9 Compressing Table 2 and 3... STDERR: CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

STDERR:

STDERR: Panic!!! Fatal Error:

STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.

0x00007FF7273A93E2 @ :: 0x00007FF7274AED79 @ :: 0x00007FF7274CCE0D @ :: 0x00007FF7274F22CA @ :: 0x00007FF7274F1F38 @ :: 0x00007FF7274D892B @ :: 0x00007FF7274D9CEB @ :: 0x00007FF7274D0152 @ :: 0x00007FF7274D06BF @ :: 0x00007FF72738F0A8 @ :: 0x00007FF72753AFEC @ :: 0x00007FFF562D257D @ ::BaseThreadInitThunk() 0x00007FFF585CAA58 @ ::RtlUserThreadStart()

Perk-Mew commented 7 months ago

i have install cuda 12.2 and reinstall the driver but now it's stop at 90 percent

Perk-Mew commented 7 months ago

https://us.download.nvidia.com/Windows/546.01/546.01-desktop-win10-win11-64bit-international-nsd-dch-whql.exe

Perk-Mew commented 7 months ago

@harold-b i got the same problem did you solve it?