Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
336 stars 109 forks source link

Bladebit Cuda What is "Command Buffer Full"? #288

Open bobbyt2012 opened 1 year ago

bobbyt2012 commented 1 year ago

What happened?

Not sure what causes this error or if it qualifies as a bug, but I don't see any resource constraints, so I'm curious as to what causes the "command buffer" to fill.

Machine Specs:

2x 2690v2 576GB RAM Nvidia A4000 (16GB) Final Directory 6TB - (5x SAS 1.2TB in RAID 0)

Command Syntax:

./bladebit_cuda -c -f -n 6 cudaplot /mnt/sas/complete

Output example:

Step 2 completed step in 7.18 seconds. [PlotWriter] Waited 9.900000 seconds for a Command to be available. [PlotWriter] Command buffer full. Waiting for commands. [PlotWriter] Waited 11.045000 seconds for a Command to be available. [PlotWriter] Command buffer full. Waiting for commands. [PlotWriter] Waited 11.099000 seconds for a Command to be available. Step 3 completed step in 37.70 seconds.

Version

bladebit-cuda-v3.0.0-alpha1-ubuntu-x86-64.tar.gz

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

Generating plot 6 / 6: 8809c67a5a54b23507e2adcf04e85e1d907125c0aef1082be2679c611be6f9a6
Plot temporary file: /mnt/sas/complete/plot-k32-2023-02-16-10-51-8809c67a5a54b23507e2adcf04e85e1d907125c0aef1082be2679c611be6f9a6.plot.tmp

Generating F1
Finished F1 in 3.17 seconds.
Table 2 completed in 26.50 seconds with 4294967296 entries.
Table 3 completed in 14.20 seconds with 4294965897 entries.
Table 4 completed in 14.25 seconds with 4294962420 entries.
Table 5 completed in 13.85 seconds with 4294880146 entries.
Table 6 completed in 11.26 seconds with 4294750577 entries.
Table 7 completed in 8.48 seconds with 4294535978 entries.
Finalizing Table 7
Finalized Table 7 in 4.99 seconds.
Completed Phase 1 in 96.71 seconds
Marked Table 6 in 2.64 seconds.
Marked Table 5 in 2.30 seconds.
Marked Table 4 in 2.25 seconds.
Marked Table 3 in 2.24 seconds.
Marked Table 2 in 2.24 seconds.
Completed Phase 2 in 11.66 seconds
Compressing Table 1 and 2...
 Step 1 completed step in 5.35 seconds.
 Step 2 completed step in 5.79 seconds.
Completed table 1 in 11.14 seconds with 3429427889 / 4294967296 entries ( 79.85% ).
Compressing tables 2 and 3...
 Step 1 completed step in 5.22 seconds.
 Step 2 completed step in 7.13 seconds.
 Step 3 completed step in 5.73 seconds.
Completed table 2 in 18.09 seconds with 3439923494 / 4294965897 entries ( 80.09% ).
Compressing tables 3 and 4...
 Step 1 completed step in 5.29 seconds.
 Step 2 completed step in 7.24 seconds.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 0.000000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 3.732000 seconds for a Command to be available.
 Step 3 completed step in 9.39 seconds.
Completed table 3 in 21.92 seconds with 3466075768 / 4294962420 entries ( 80.70% ).
Compressing tables 4 and 5...
 Step 1 completed step in 5.23 seconds.
 Step 2 completed step in 7.18 seconds.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 0.000000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 9.900000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.045000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.099000 seconds for a Command to be available.
 Step 3 completed step in 37.70 seconds.
Completed table 4 in 50.11 seconds with 3532863527 / 4294880146 entries ( 82.26% ).
Compressing tables 5 and 6...
 Step 1 completed step in 5.18 seconds.
 Step 2 completed step in 7.41 seconds.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 0.000000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 8.999000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.190000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.282000 seconds for a Command to be available.
 Step 3 completed step in 37.39 seconds.
Completed table 5 in 49.98 seconds with 3713432800 / 4294750577 entries ( 86.46% ).
Compressing tables 6 and 7...
 Step 1 completed step in 5.22 seconds.
 Step 2 completed step in 8.19 seconds.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 0.000000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 8.867000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.047000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 11.247000 seconds for a Command to be available.
 Step 3 completed step in 37.76 seconds.
Completed table 6 in 51.16 seconds with 4294535978 / 4294535978 entries ( 100.00% ).
Serializing P7 entries
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 10.643000 seconds for a Command to be available.
[PlotWriter] Command buffer full. Waiting for commands.
[PlotWriter] Waited 9.673000 seconds for a Command to be available.
Completed serializing P7 entries in 23.44 seconds.
Completed Phase 3 in 225.85 seconds
Completed Plot 1 in 334.23 seconds ( 5.57 minutes )

/mnt/sas/complete/plot-k32-2023-02-16-10-51-8809c67a5a54b23507e2adcf04e85e1d907125c0aef1082be2679c611be6f9a6.plot.tmp -> /mnt/sas/complete/plot-k32-2023-02-16-10-51-8809c67a5a54b23507e2adcf04e85e1d907125c0aef1082be2679c611be6f9a6.plot
Completed writing plot in 156.39 seconds
Final plot table pointers: 
 Table 1:       1290084092 ( 0x000000004ce51efc )
 Table 2:      16129733504 ( 0x00000003c1683380 )
 Table 3:      30112828079 ( 0x0000000702dd4aaf )
 Table 4:      44202224579 ( 0x0000000a4aa86bc3 )
 Table 5:      58563115979 ( 0x0000000da2a23bcb )
 Table 6:      73658005979 ( 0x00000011265bf9db )
 Table 7:      91115048129 ( 0x0000001536e14cc1 )
 C 1    :             4096 ( 0x0000000000001000 )
 C 2    :          1721916 ( 0x00000000001a463c )
 C 3    :          1722092 ( 0x00000000001a46ec )

Final plot table sizes: 
 Table 1: 14152.19 MiB
 Table 2: 13335.32 MiB
 Table 3: 13436.70 MiB
 Table 4: 13695.61 MiB
 Table 5: 14395.61 MiB
 Table 6: 16648.33 MiB
 Table 7: 16894.31 MiB
 C 1    : 1.64 MiB
 C 2    : 0.00 MiB
 C 3    : 1228.68 MiB
Jsewill commented 1 year ago

@bobbyt2012 As far as I know, the command buffer filling is simply indicative of the destination not keeping up with the plotter. In other words, it is waiting to write to the destination. I believe that the [PlotWriter] Waited x.x seconds for a Command to be available. is an indication of IO wait time.

harold-b commented 1 year ago

Indeed it is as @Jsewill mentioned.

The buffer is currently of fixed size. It will likely be swapped out and be configurable with a dynamic one during beta, however, you will still hit a wall at the end of a plot even with an increased buffer count if the destination is still slow as you will have to wait for the plot to complete writing. It will also use more RAM as you increase the buffer count for the plot writer.

bdelgado1995 commented 1 year ago

Are there any new updates on configuration parameters related to the buffer counts? I'm seeing the 'Command Buffer Full' on some of my plotting even to a local NVME drive. Edit: Looking more into this, one of my NVME drives seems to be under-performing so it looks like that's an issue with that drive. I don't see this error on a different drive.