Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
336 stars 109 forks source link

Fatal Error on bladebit_cuda #268

Open ArigornStrider opened 1 year ago

ArigornStrider commented 1 year ago

Ubuntu 22.04 on a Dell R720, dual E5-2697v2, 256GB RAM, 3x FusionIO 1.6TB sx350 in btrfs RAID0, running chia_plot_copy from MadMax over 10Gbps network to farmer, and on the third plot in, bladebit_cuda (downloaded binary from downloads.chia.net) crashed with the following message. Let me know what I left out that would be helpful for debugging. I'm wondering if there is insufficient RAM in the system for the plotter and chia_plot_copy to both run at the same time?

Panic!!! Fatal Error: Failed to write to plot with error 5: ./bladebit_cuda(+0xcf8cb)[0x55a6a165d8cb] ./bladebit_cuda(+0xcf0af)[0x55a6a165d0af] ./bladebit_cuda(+0xbdb5e)[0x55a6a164bb5e] ./bladebit_cuda(+0xbe510)[0x55a6a164c510] ./bladebit_cuda(+0xd062d)[0x55a6a165e62d] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f42d3445b43] /lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x7f42d34d7a00]

Edit: Subsequent errors after the initial error:

Panic!!! Fatal Error: Failed to write to plot with error 5: ./bladebit_cuda(+0xcf8cb)[0x557d1f28b8cb] ./bladebit_cuda(+0xcf0af)[0x557d1f28b0af] ./bladebit_cuda(+0xbdb5e)[0x557d1f279b5e] ./bladebit_cuda(+0xbe510)[0x557d1f27a510] ./bladebit_cuda(+0xd062d)[0x557d1f28c62d] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7faf38ab4b43] /lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x7faf38b46a00] CUDA error: 4 (0x4 ) cudaErrorCudartUnloading : driver shutting down

Panic!!! Fatal Error: CUDA error cudaErrorCudartUnloading : driver shutting down. ./bladebit_cuda(+0xcf8cb)[0x557d1f28b8cb] ./bladebit_cuda(+0xcf0af)[0x557d1f28b0af] ./bladebit_cuda(+0x5217a)[0x557d1f20e17a] ./bladebit_cuda(+0x199ff)[0x557d1f1d59ff] ./bladebit_cuda(+0x1cf58)[0x557d1f1d8f58] ./bladebit_cuda(+0x18245)[0x557d1f1d4245] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7faf38a49d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7faf38a49e40] ./bladebit_cuda(+0x1974e)[0x557d1f1d574e]

caodaye commented 1 year ago

我的问题是:Fatal Error: Failed to open plot file with error: 3 不知道该怎么解决

mmitech commented 1 year ago

Probably the same issue?

Final plot table pointers:
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 1:                0 ( 0x0000000000000000 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 2:                0 ( 0x0000000000000000 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 3:       1290677972 ( 0x000000004cee2ed4 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 4:      12036719472 ( 0x00000002cd71c370 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 5:      26395796022 ( 0x00000006254fe236 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 6:      41487264447 ( 0x00000009a8d56abf )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 7:      58938204372 ( 0x0000000db8fda0d4 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 1    :          1048576 ( 0x0000000000100000 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 2    :          2765796 ( 0x00000000002a33e4 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 3    :          2765972 ( 0x00000000002a3494 )
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]: Final plot table sizes:
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 1: 0.00 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 2: 0.00 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 3: 10248.22 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 4: 13693.88 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 5: 14392.35 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 6: 16642.51 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  Table 7: 16888.40 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 1    : 1.64 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 2    : 0.00 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]:  C 3    : 1228.25 MiB
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]: Generating plot 8: f26fa6a7e2501d31b83d3ca9a3484177ad94d2613fda4b34daf84835474e3e6b
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]: Plot temporary file: /mnt/data/plot-k32-c09-2023-02-11-21-30-f26fa6a7e2501d31b83d3ca9a3484177ad94d2613fda4b34daf84835474e3e6b.plot.tmp
Feb 11 21:30:48 hp-plotter01 bladebit_cuda[4991]: Generating F1
Feb 11 21:30:51 hp-plotter01 bladebit_cuda[4991]: Finished F1 in 2.83 seconds.
Feb 11 21:30:57 hp-plotter01 bladebit_cuda[4991]: Table 2 completed in 6.16 seconds with 4294930927 entries.
Feb 11 21:31:07 hp-plotter01 bladebit_cuda[4991]: Table 3 completed in 10.74 seconds with 4294843371 entries.
Feb 11 21:31:30 hp-plotter01 bladebit_cuda[4991]: Table 4 completed in 22.12 seconds with 4294684284 entries.
Feb 11 21:31:48 hp-plotter01 bladebit_cuda[4991]: Table 5 completed in 18.56 seconds with 4294396975 entries.
Feb 11 21:32:04 hp-plotter01 bladebit_cuda[4991]: Table 6 completed in 15.51 seconds with 4293816053 entries.
Feb 11 21:32:14 hp-plotter01 bladebit_cuda[4991]: Table 7 completed in 10.64 seconds with 4292570364 entries.
Feb 11 21:32:14 hp-plotter01 bladebit_cuda[4991]: Finalizing Table 7
Feb 11 21:32:20 hp-plotter01 bladebit_cuda[4991]: Finalized Table 7 in 5.68 seconds.
Feb 11 21:32:20 hp-plotter01 bladebit_cuda[4991]: Completed Phase 1 in 92.24 seconds
Feb 11 21:32:23 hp-plotter01 bladebit_cuda[4991]: Marked Table 6 in 2.95 seconds.
Feb 11 21:32:26 hp-plotter01 bladebit_cuda[4991]: Marked Table 5 in 2.58 seconds.
Feb 11 21:32:28 hp-plotter01 bladebit_cuda[4991]: Marked Table 4 in 2.47 seconds.
Feb 11 21:32:28 hp-plotter01 bladebit_cuda[4991]: Completed Phase 2 in 8.00 seconds
Feb 11 21:32:28 hp-plotter01 bladebit_cuda[4991]: Compressing Table 3 and 4...
Feb 11 21:32:34 hp-plotter01 bladebit_cuda[4991]:  Step 1 completed step in 5.71 seconds.
Feb 11 21:32:41 hp-plotter01 bladebit_cuda[4991]:  Step 2 completed step in 7.00 seconds.
Feb 11 21:32:41 hp-plotter01 bladebit_cuda[4991]: Completed table 3 in 12.71 seconds with 3465670903 / 4294684284 entries ( 80.70% ).
Feb 11 21:32:41 hp-plotter01 bladebit_cuda[4991]: Compressing tables 4 and 5...
Feb 11 21:32:47 hp-plotter01 bladebit_cuda[4991]:  Step 1 completed step in 6.02 seconds.
Feb 11 21:32:57 hp-plotter01 bladebit_cuda[4991]:  Step 2 completed step in 9.93 seconds.
Feb 11 21:33:04 hp-plotter01 bladebit_cuda[4991]:  Step 3 completed step in 7.25 seconds.
Feb 11 21:33:04 hp-plotter01 bladebit_cuda[4991]: Completed table 4 in 23.21 seconds with 3532255459 / 4294396975 entries ( 82.25% ).
Feb 11 21:33:04 hp-plotter01 bladebit_cuda[4991]: Compressing tables 5 and 6...
Feb 11 21:33:10 hp-plotter01 bladebit_cuda[4991]:  Step 1 completed step in 6.09 seconds.
Feb 11 21:33:20 hp-plotter01 bladebit_cuda[4991]:  Step 2 completed step in 10.19 seconds.
Feb 11 21:33:28 hp-plotter01 bladebit_cuda[4991]:  Step 3 completed step in 7.54 seconds.
Feb 11 21:33:28 hp-plotter01 bladebit_cuda[4991]: Completed table 5 in 23.82 seconds with 3712380165 / 4293816053 entries ( 86.46% ).
Feb 11 21:33:28 hp-plotter01 bladebit_cuda[4991]: Compressing tables 6 and 7...
Feb 11 21:33:34 hp-plotter01 bladebit_cuda[4991]:  Step 1 completed step in 6.10 seconds.
Feb 11 21:33:45 hp-plotter01 bladebit_cuda[4991]:  Step 2 completed step in 11.10 seconds.
Feb 11 21:33:53 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Command buffer full. Waiting for commands.
Feb 11 21:33:53 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Waited 0.000000 seconds for a Command to be available.
Feb 11 21:33:53 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Command buffer full. Waiting for commands.
Feb 11 21:33:59 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Waited 6.440000 seconds for a Command to be available.
Feb 11 21:34:00 hp-plotter01 bladebit_cuda[4991]:  Step 3 completed step in 14.78 seconds.
Feb 11 21:34:00 hp-plotter01 bladebit_cuda[4991]: Completed table 6 in 31.98 seconds with 4292570364 / 4292570364 entries ( 100.00% ).
Feb 11 21:34:00 hp-plotter01 bladebit_cuda[4991]: Serializing P7 entries
Feb 11 21:34:01 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Command buffer full. Waiting for commands.
Feb 11 21:34:10 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Waited 8.672000 seconds for a Command to be available.
Feb 11 21:34:12 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Command buffer full. Waiting for commands.
Feb 11 21:34:21 hp-plotter01 bladebit_cuda[4991]: [PlotWriter] Waited 8.496000 seconds for a Command to be available.
Feb 11 21:34:21 hp-plotter01 bladebit_cuda[4991]: Completed serializing P7 entries in 21.42 seconds.
Feb 11 21:34:21 hp-plotter01 bladebit_cuda[4991]: Completed Phase 3 in 113.14 seconds
Feb 11 21:34:21 hp-plotter01 bladebit_cuda[4991]: Completed Plot 1 in 213.38 seconds ( 3.56 minutes )
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: *** Panic!!! *** Fatal Error:
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: Failed to write to plot with error 112:
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /usr/bin/bladebit_cuda(+0xce8fd)[0x55bc4511e8fd]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /usr/bin/bladebit_cuda(+0xce0cf)[0x55bc4511e0cf]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /usr/bin/bladebit_cuda(+0xbcb5d)[0x55bc4510cb5d]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /usr/bin/bladebit_cuda(+0xbd4c0)[0x55bc4510d4c0]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /usr/bin/bladebit_cuda(+0xcf676)[0x55bc4511f676]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f93a86d7609]
Feb 11 21:35:19 hp-plotter01 bladebit_cuda[4991]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f93a8291133]
Feb 11 21:36:06 hp-plotter01 systemd[1]: bladebit.service: Main process exited, code=exited, status=1/FAILURE
Feb 11 21:36:06 hp-plotter01 systemd[1]: bladebit.service: Failed with result 'exit-code'.
ShitcoinSolutions commented 1 year ago

I'm having the same "Command Buffer Full" error. I just killed some processes that really weren't taking much resources, but I'm hoping this will allow me to plot on through. My plot finished, but it took about 10 minutes.

ArigornStrider commented 1 year ago

Might be worth checking out Arch Linux or Clear Linux as those are fairly stripped down distros. I'm looking at arch for my farmer, but haven't made the switch yet.

ShitcoinSolutions commented 1 year ago

hey @ArigornStrider it's FlipThisCrypto from Discord. I thought I changed my old name on here lol