Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
336 stars 108 forks source link

[BUG] Windows Bladebit v2.0.1 diskplot --cache 220G failing for 50% of the plots at random stages of Phase 3 #246

Open json-b0urne opened 1 year ago

json-b0urne commented 1 year ago

System: i9-10980xe, 256GB RAM, Samsung 980 Pro, Windows 10 21H2 build 19044.1889

Bladebit: v2.0.1

Command line:

%LOCALAPPDATA%\Programs\Chia\resources\app.asar.unpacked\daemon\bladebit\bladebit.exe ^
    -t 36 ^
    -f ... ^
    -c ...^
    -n 1000 ^
    -v ^
    diskplot ^
    --unbounded ^
    --cache 220G ^
    -t1 %temp_dir_path%\ ^
    %final_dir_path%\

The system is stable, MadMax plots into 120GB RAM disk for days without issues. Bladebit is randomly failing one plot out of two at some point of Phase 3, leaving .tmp files of random sizes in the final dir. When it runs with 220G cache there's plenty of free RAM left. Any suggestions?

Event Viewer error example:

Faulting application name: bladebit.exe, version: 0.0.0.0, time stamp: 0x636974ca
Faulting module name: bladebit.exe, version: 0.0.0.0, time stamp: 0x636974ca
Exception code: 0xc0000409
Fault offset: 0x00000000001936d9
Faulting process id: 0x4b00
Faulting application start time: 0x01d8f36c520ad42f
Faulting application path: C:\Users\...\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon\bladebit\bladebit.exe
Faulting module path: C:\Users\...\AppData\Local\Programs\Chia\resources\app.asar.unpacked\daemon\bladebit\bladebit.exe
Report Id: 
Faulting package full name: 
Faulting package-relative application ID: 

Output before failure: Example 1:

Running Phase 3
Compressing tables 1 and 2.
Step 1 Allocated 2969.44 / 4062.89 MiB
Step 2 using 1.68 / 3.97 GiB.

Example 2:

Running Phase 3
Compressing tables 1 and 2.
Step 1 Allocated 2969.44 / 4062.89 MiB
Step 2 using 1.68 / 3.97 GiB.
Table 1 now has 3429416189 / 4294957371 ( 79.85% ) entries.
Table 1 I/O wait time: 67.80 seconds.
Finished compressing tables 1 and 2 in 168.15 seconds.
Compressing tables 2 and 3.
Step 1 Allocated 3455.82 / 4062.89 MiB
Step 2 using 1.67 / 3.97 GiB.
json-b0urne commented 1 year ago

--unbounded could be the culprit, it seems to be running stable without it

harold-b commented 1 year ago

Thanks for bringing this up. --unbounded should have been disabled for now, and I thought it was. For now that flag should not be used

json-b0urne commented 1 year ago

Most welcome. Thanks for the quick response.