Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
338 stars 105 forks source link

3.1.0-beta1 win x64 128G CUDA: VirtualAlloc failed. #387

Open enk37 opened 1 year ago

enk37 commented 1 year ago
PS C:\mining\bladebit-cuda-v3.1.0-beta1> .\bladebit_cuda.exe -f farmer -c contract -z 7 cudaplot --disk-128 -t1 D:\chia_temp\ R:\chia_final\

Bladebit Chia Plotter
Version      : 3.1.0-beta1
Git Commit   : 076eba490f1c08b3a7bf10ea0a08f80be758c7b9
Compiled With: msvc 19.29.30151

[Global Plotting Config]
 Will create 1 plots.
 Thread count          : 32
 Warm start enabled    : false
 NUMA disabled         : false
 CPU affinity disabled : false
 Farmer public key     : farmer
 Pool contract address : contract
 Compression Level     : 7
 Benchmark mode        : disabled

[Bladebit CUDA Plotter]
 Host RAM        : 127 GiB
 Direct transfers: true

Selected cuda device 0 : NVIDIA GeForce RTX 3090
 CUDA Compute Capability   : 8.6
 SM count                  : 82
 Max blocks per SM         : 16
 Max threads per SM        : 1536
 Async Engine Count        : 2
 L2 cache size             : 6.00 MB
 L2 persist cache max size : 4.50 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 24.00 GB
  Free                     : 22.74 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 92051827920  bytes ( 87787.46  MiB or 85.73  GiB )
Intermediate RAM required : 4362149888   bytes ( 4160.07   MiB or 4.06   GiB )
Host RAM required         : 28319940608  bytes ( 27008.00  MiB or 26.38  GiB )
Total Host RAM required   : 120371768528 bytes ( 114795.46 MiB or 112.10 GiB )
GPU RAM required          : 6140243968   bytes ( 5855.79   MiB or 5.72   GiB )
Allocating buffers...
Table pairs allocated as pinned: false

Fatal Error:
VirtualAlloc failed.

actual available memory 113 GB free as per task manager

majekqwert commented 1 year ago

Increase the amount of virtual ram.

enk37 commented 1 year ago

nope. does not help. virtual RAM (paging file) is at 43 Gb already...

majekqwert commented 1 year ago

set it to 140 GB it works for me.

enk37 commented 1 year ago

ok, it starts but plotting of 7 lvl compressed plot takes more than 70 minutes and GPU utilization is almost unnoticeable. disk access though is very high

majekqwert commented 1 year ago

i tested and need about 80gb virtual ram.

You need a very fast SSD drive and good transfers to the target drive. For me, with rtx 3060 TI, the whole process takes 8 minutes (plotting + data copying), of which 2 minutes is the transfer time to the NAS

It seems to me that C7 is not optimal. The power consumption will be too high, especially when next year the filter will be reduced. C5 seems to be the most optimal.

enk37 commented 1 year ago

this is indeed what is expected and hence bug report. TMP folder is a fast M2 SSD. VirtualRam increased, but page file is on HDD.

enk37 commented 1 year ago
Generating F1
Finished F1 in 6.89 seconds.
Table 2 completed in 23.98 seconds with 4294923661 entries.
Table 3 completed in 58.22 seconds with 4294893948 entries.
Table 4 completed in 304.59 seconds with 4294875644 entries.
Table 5 completed in 576.11 seconds with 4294754004 entries.
Table 6 completed in 34.67 seconds with 4294478182 entries.
Table 7 completed in 19.40 seconds with 4293847663 entries.
Finalizing Table 7
Finalized Table 7 in 32.69 seconds.
Completed Phase 1 in 1057.89 seconds
Marked Table 6 in 27.17 seconds.
Marked Table 5 in 18.80 seconds.
Marked Table 4 in 8.85 seconds.
Marked Table 3 in 8.76 seconds.
Completed Phase 2 in 63.59 seconds
Compressing Table 2 and 3...
 Step 1 completed step in 12.22 seconds.
 Step 2 completed step in 9.94 seconds.
Completed table 2 in 22.16 seconds with 3439807533 / 4294893948 entries ( 80.09% ).
Compressing tables 3 and 4...
 Step 1 completed step in 9.43 seconds.
 Step 2 completed step in 13.23 seconds.
 Waiting for parks buffer to become available.
 Waited 2084.400 seconds for the park buffer to be released.
 Step 3 completed step in 38.68 seconds.
Completed table 3 in 61.34 seconds with 3465961402 / 4294875644 entries ( 80.70% ).
Compressing tables 4 and 5...
 Step 1 completed step in 12.67 seconds.
 Step 2 completed step in 13.06 seconds.
 Waiting for parks buffer to become available.
 Waited 2267.912 seconds for the park buffer to be released.
 Step 3 completed step in 222.53 seconds.
Completed table 4 in 248.26 seconds with 3532701407 / 4294754004 entries ( 82.26% ).
Compressing tables 5 and 6...
 Step 1 completed step in 22.39 seconds.
 Step 2 completed step in 13.37 seconds.
 Waiting for parks buffer to become available.
 Waited 2266.334 seconds for the park buffer to be released.
 Step 3 completed step in 221.50 seconds.
Completed table 5 in 257.26 seconds with 3713137403 / 4294478182 entries ( 86.46% ).
Compressing tables 6 and 7...
 Step 1 completed step in 21.18 seconds.
 Step 2 completed step in 14.00 seconds.
 Waiting for parks buffer to become available.
 Waited 2285.061 seconds for the park buffer to be released.
 Step 3 completed step in 241.96 seconds.
Completed table 6 in 277.15 seconds with 4293847663 / 4293847663 entries ( 100.00% ).
Serializing P7 entries
Completed serializing P7 entries in 5.60 seconds.
Completed Phase 3 in 871.78 seconds
Completed Plot 1 in 1993.27 seconds ( 33.22 minutes )