Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
337 stars 106 forks source link

BladeBit just hangs when running from GUI #201

Open Drhicom opened 2 years ago

Drhicom commented 2 years ago

Good morning, Simple question, does BladeBit work in the GUI of 1.5.0 and is there anything else that needs to be installed besides chia-1.5.0. I have DELL t7610 DUAL CPU e5-2670 AND 512gb of Ram. When I looked at the log this is what it puts out. System Memory: 497/511 GiB. Memory require Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x0000015E24B2B040> Fatal error on SSL transport protocol: <asyncio.sslproto.SSLProtocol object at 0x0000015E24AF5550> transport: <ProactorSocketTransport fd=-1 read=> Traceback (most recent call last): File "asyncio\sslproto.py", line 690, in processwrite_backlog File "asyncio\proactor_events.py", line 359, in write File "asyncio\proactorevents.py", line 395, in loopwriting AttributeError: 'NoneType' object has no attribute 'send' Exception ignored in: <function SSLProtocolTransport.del at 0x0000015E23500940> Traceback (most recent call last): File "asyncio\sslproto.py", line 321, in del File "asyncio\sslproto.py", line 316, in close File "asyncio\sslproto.py", line 599, in startshutdown File "asyncio\sslproto.py", line 604, in writeappdata File "asyncio\sslproto.py", line 712, in processwrite_backlog File "asyncio\sslproto.py", line 726, in fatalerror File "asyncio\proactor_events.py", line 151, in forceclose File "asyncio\base_events.py", line 751, in call_soon File "asyncio\base_events.py", line 515, in checkclosed RuntimeError: Event loop is closed Seeing that ssl error I shut down chia and reran the C:\Users\rb653504\AppData\Local\chia-blockchain\app-1.5.0\resources\app.asar.unpacked\daemon> .\chia.exe init -c c:\files\ca\ to recreate the SSL certs just in case no change.

harold-b commented 2 years ago

The shown log is all related to chia-blockchain. It would be good to open an issue there and refer this issue. Sounds like something unrelated in one of the chia processes might be interfering with your plot session

Drhicom commented 2 years ago

Thanks for the reply, I'm trying to run this from the GUI under + Add A Plot,

Hope this helps https://github.com/Chia-Network/chia-blockchain/issues/12972

Drhicom commented 2 years ago

Harold, Since this option is in the GUI it should work correct, can you verify this please, can somebody DM me etc. They could verify my setup if needed.

harold-b commented 2 years ago

I saw your comments on Keybase. I will follow up there and bring attention to the issue to the blockchain team today. Meanwhile, would you mind running it directly from the cli to see if you have any issues with it in isolation?

Drhicom commented 2 years ago

I also download the beta version 2 bladebit-v2.0.0-alpha2-windows-x86-64.zip and unzipped it into the c:\bladebit folder I made. And ran this PS C:\bladebit> bladebit -f famer-key -c pool-contract-key diskplot -a --cache 99G -t1 T:\nvme\ H:\BB-plot [Bladebit Disk Plotter] Heap size : 3.37 GiB ( 3447.82 MiB ) Cache size : 99.00 GiB ( 101376.00 MiB ) Bucket count : 256 Alternating I/O: true F1 threads : 40 FP threads : 40 C threads : 40 P2 threads : 40 P3 threads : 40 I/O threads : 1 Temp1 block sz : 4096 Temp2 block sz : 4096 I/O metrices enabled. Allocating memory PS C:\bladebit> What gives,

harold-b commented 2 years ago

What are errors are you able to find in the windows event viewer?

Drhicom commented 2 years ago

Log Name: Application Source: Application Error Date: 8/15/2022 4:02:21 PM Event ID: 1000 Task Category: (100) Level: Error Keywords: Classic User: N/A Computer: Dell-T7610 Description: Faulting application name: bad_module_info, version: 0.0.0.0, time stamp: 0x00000000 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0xc0000005 Fault offset: 0x0000001f0000001e Faulting process id: 0x373c Faulting application start time: 0x01d8b0e1e23e3a5c Faulting application path: bad_module_info Faulting module path: unknown Report Id: 710a4a98-c2e6-4193-92d4-73b7e9a1a0fb Faulting package full name: Faulting package-relative application ID: Event Xml:

1000 0 2 100 0 0x80000000000000 41040 Application Dell-T7610 bad_module_info 0.0.0.0 00000000 unknown 0.0.0.0 00000000 c0000005 0000001f0000001e 373c 01d8b0e1e23e3a5c bad_module_info unknown 710a4a98-c2e6-4193-92d4-73b7e9a1a0fb
Drhicom commented 2 years ago

Hello Harold, Has anybody taken a look at this yet? Are any updates in the beta releases like Chia 1.5.1533.0

harold-b commented 2 years ago

Hi Roy,

Like I mentioned on KB, the error you showed from the event, bad_module_info I don't believe is something happening in the app. It can be one of many things including faulty hardware. It might be good to run a thorough memory check or see if there's a setting configures in your machine that does not allow us to allocate the number of memory pages we need.

Is your machine using a larger page size? Or (not sure if windows supports this, like linux does) but are there any reserved or something like that?

Drhicom commented 2 years ago

This machine has run memory diag from built in BIOS and also run memtest86 for a couple of days on 512gb of memory with no errors. image The windows page file is set to "let windows choose what's best for my computer" The machine is a Windows 10 Enterprise version. So who has a windows machine that can test this and contact me to bounce some ideas around on a setup.

harold-b commented 2 years ago

I don't have any more info off-hand. We tested the in-ram version on Windows Server 2019 and the disk version on Windows 10 Pro and Windows 11 Pro.

I've had issues like this on my desktop PC with other apps where I had to actually clock down the dimms from their rated value which fixed them.

Do you happen to have this machine dual-booting to any linux distro to see if it runs fine there?

Drhicom commented 2 years ago

The machine is a Dell T7610 dual cpu e5-2670v2 512gb (16 32gb dims) of ram windows 10 enterprise.

harold-b commented 2 years ago

Perhaps try running (with one of the bladebit disk binaries) the memtest which will allocate twice the specified size to do a memcpy test.

You can run this (again in a bladebit disk binary) via the cli with: bladebit <thread_count> memtest -s <size>[K|M|G]

Then see if it crashes just by doing that with increasing memory sizes.

Drhicom commented 2 years ago

Good morning Harold I ran some tests like you wanted on v2 of blade bit C:\bladebit>dir Volume in drive C has no label. Volume Serial Number is D1FA-6E00

Directory of C:\bladebit

08/15/2022 07:42 PM

. 08/15/2022 07:42 PM .. 11/11/2021 05:54 PM 1,361,408 bladebit1.exe 11/11/2021 05:54 PM 1,361,408 bladebitv1 - Copy.exe 08/09/2022 07:59 PM 4,039,168 bladebitv2 - Copy.exe 08/09/2022 07:59 PM 4,039,168 bladebitv2.exe 08/15/2022 07:42 PM 225 bladebitv2.txt 7 File(s) 14,841,593 bytes 2 Dir(s) 93,481,148,416 bytes free

C:\bladebit>bladebitv2 -t 20 memtest -s 200g Size : 204800.00 MiB Threads: 20 Passes : 1 Allocating buffer... Starting Test

Copied 204800.00 MiB in 17.52 seconds @ 11688.16 MiB/s (11.41 GiB/s) or 12256 MB/s (12.26 GB/s).

C:\bladebit>bladebitv2 -t 20 memtest -s 200g Size : 204800.00 MiB Threads: 20 Passes : 1 Allocating buffer... Starting Test

Copied 204800.00 MiB in 18.27 seconds @ 11207.18 MiB/s (10.94 GiB/s) or 11752 MB/s (11.75 GB/s).

C:\bladebit>bladebitv2 -t 20 memtest -s 250g Size : 256000.00 MiB Threads: 20 Passes : 1 Allocating buffer... Starting Test

Copied 256000.00 MiB in 41.62 seconds @ 6151.18 MiB/s (6.01 GiB/s) or 6450 MB/s (6.45 GB/s).

C:\bladebit>bladebitv2 -t 20 memtest -s 280g Size : 286720.00 MiB Threads: 20 Passes : 1 Allocating buffer...

Fatal Error: VirtualAlloc failed.

C:\bladebit>bladebitv2 -t 20 memtest -s 280g Size : 286720.00 MiB Threads: 20 Passes : 1 Allocating buffer...

Fatal Error: VirtualAlloc failed.

C:\bladebit>bladebitv2 -t 20 memtest -s 270g Size : 276480.00 MiB Threads: 20 Passes : 1 Allocating buffer...

C:\bladebit>bladebitv2 -t 20 memtest -s 260g Si C:\bladebit>bladebitv2 -t 20 memtest -s 255g Size : 261120.00 MiB Threads: 20 Passes : 1 Allocating buffer...

C:\bladebit>bladebitv2 -t 20 memtest -s 250g Size : 256000.00 MiB Threads: 20 Passes : 1 Allocating buffer... Starting Test

Copied 256000.00 MiB in 218.49 seconds @ 1171.66 MiB/s (1.14 GiB/s) or 1229 MB/s (1.23 GB/s).

C:\bladebit>

Running the GUI to make BB plots Log file from GUI

Creating 1 plots: Output path : H:\BB-plot Thread count : 40 Warm start enabled : false Farmer public key : farm-key Pool contract address : pool-key System Memory: 501/511 GiB. Memory required: 416 GiB. Allocating buffers. Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x000001E43AC25D00> Fatal error on SSL transport protocol: <asyncio.sslproto.SSLProtocol object at 0x000001E43AC851F0> transport: <_ProactorSocketTransport fd=-1 read=<_OverlappedFuture cancelled>> Traceback (most recent call last): File "asyncio\sslproto.py", line 690, in _process_write_backlog File "asyncio\proactor_events.py", line 359, in write File "asyncio\proactor_events.py", line 395, in _loop_writing AttributeError: 'NoneType' object has no attribute 'send' Exception ignored in: <function _SSLProtocolTransport.del at 0x000001E439621940> Traceback (most recent call last): File "asyncio\sslproto.py", line 321, in del File "asyncio\sslproto.py", line 316, in close File "asyncio\sslproto.py", line 599, in _start_shutdown File "asyncio\sslproto.py", line 604, in _write_appdata File "asyncio\sslproto.py", line 712, in _process_write_backlog File "asyncio\sslproto.py", line 726, in _fatal_error File "asyncio\proactor_events.py", line 151, in _force_close File "asyncio\base_events.py", line 751, in call_soon File "asyncio\base_events.py", line 515, in _check_closed RuntimeError: Event loop is closed

When I run the bladbitv1 (1.2.4) from CLI C:\bladebit> C:\bladebit>bladebit1 -f farm-key -c contract-key H:\BB-plot Creating 1 plots: Output path : H:\BB-plot Thread count : 40 Warm start enabled : false Farmer public key : farm-key Pool contract address : pool-key

System Memory: 500/511 GiB. Memory required: 416 GiB. Allocating buffers. Generating plot 1 / 1: dbec1888eb3b16ad707119304a9fd337216c2a1077cbd8d3aceeae7147569611

Running Phase 1 Generating F1... Finished F1 generation in 11.93 seconds. Sorting F1... Finished F1 sort in 39.71 seconds. Forward propagating to table 2... Pairing L/R groups... Finished pairing L/R groups in 23.2530 seconds. Created 4294967296 pairs. Average of 236.1406 pairs per group. Computing Fx... Finished computing Fx in 25.7360 seconds. Sorting entries... Finished sorting in 100.92 seconds. Finished forward propagating table 2 in 150.53 seconds. Forward propagating to table 3... Pairing L/R groups... Finished pairing L/R groups in 17.8260 seconds. Created 4294851916 pairs. Average of 236.1343 pairs per group. Computing Fx... Finished computing Fx in 22.2280 seconds. Sorting entries... Finished sorting in 97.38 seconds. Finished forward propagating table 3 in 138.43 seconds. Forward propagating to table 4... Pairing L/R groups... Finished pairing L/R groups in 17.6960 seconds. Created 4294760027 pairs. Average of 236.1292 pairs per group. Computing Fx... Finished computing Fx in 40.9500 seconds. Sorting entries... Finished sorting in 89.12 seconds. Finished forward propagating table 4 in 148.38 seconds. Forward propagating to table 5... Pairing L/R groups... Finished pairing L/R groups in 17.5480 seconds. Created 4294491606 pairs. Average of 236.1145 pairs per group. Computing Fx... Finished computing Fx in 40.8820 seconds. Sorting entries... Finished sorting in 90.03 seconds. Finished forward propagating table 5 in 149.02 seconds. Forward propagating to table 6... Pairing L/R groups... Finished pairing L/R groups in 17.6460 seconds. Created 4294051974 pairs. Average of 236.0903 pairs per group. Computing Fx... Finished computing Fx in 22.9430 seconds. Sorting entries... Finished sorting in 83.12 seconds. Finished forward propagating table 6 in 124.41 seconds. Forward propagating to table 7... Pairing L/R groups... Finished pairing L/R groups in 18.3730 seconds. Created 4293143215 pairs. Average of 236.0403 pairs per group. Computing Fx... Finished computing Fx in 21.5050 seconds. Finished forward propagating table 7 in 41.14 seconds. Finished Phase 1 in 803.61 seconds. Running Phase 2 Prunning table 6... Finished prunning table 6 in 0.50 seconds. Prunning table 5... Finished prunning table 5 in 32.47 seconds. Prunning table 4... Finished prunning table 4 in 30.83 seconds. Prunning table 3... Finished prunning table 3 in 31.98 seconds. Prunning table 2... Finished prunning table 2 in 31.79 seconds. Finished Phase 2 in 128.07 seconds. Running Phase 3 Compressing tables 1 and 2... Finished compressing tables 1 and 2 in 80.86 seconds Table 1 now has 3429328635 / 4294967296 entries ( 79.85% ). Compressing tables 2 and 3... Finished compressing tables 2 and 3 in 80.23 seconds Table 2 now has 3439715208 / 4294851916 entries ( 80.09% ). Compressing tables 3 and 4... Finished compressing tables 3 and 4 in 79.77 seconds Table 3 now has 3465822831 / 4294760027 entries ( 80.70% ). Compressing tables 4 and 5... Finished compressing tables 4 and 5 in 80.92 seconds Table 4 now has 3532405476 / 4294491606 entries ( 82.25% ). Compressing tables 5 and 6... Finished compressing tables 5 and 6 in 84.99 seconds Table 5 now has 3712646333 / 4294051974 entries ( 86.46% ). Compressing tables 6 and 7... Finished compressing tables 6 and 7 in 104.81 seconds Table 6 now has 4293143211 / 4293143215 entries ( 100.00% ). Finished Phase 3 in 511.64 seconds. Running Phase 4 Writing P7. Finished writing P7 in 0.94 seconds. Writing C1 table. Finished writing C1 table in 0.00 seconds. Writing C2 table. Finished writing C2 table in 0.00 seconds. Writing C3 table. Finished writing C3 table in 0.93 seconds. Finished Phase 4 in 1.90 seconds. Writing final plot tables to disk

Plot H:\BB-plot/plot-k32-2022-08-19-14-37-dbec1888eb3b16ad707119304a9fd337216c2a1077cbd8d3aceeae7147569611.plot finished writing to disk: Table 1 pointer : 4096 ( 0x0000000000001000 ) Table 2 pointer : 1954320384 ( 0x00000000747c9000 ) Table 3 pointer : 3051667456 ( 0x00000000b5e4c000 ) Table 4 pointer : 4255141888 ( 0x00000000fda05000 ) Table 5 pointer : 1434300416 ( 0x00000000557db000 ) Table 6 pointer : 3641094144 ( 0x00000000d906b000 ) Table 7 pointer : 3912609792 ( 0x00000000e935b000 ) C1 table pointer : 146997248 ( 0x0000000008c30000 ) C2 table pointer : 148717568 ( 0x0000000008dd4000 ) C3 table pointer : 148721664 ( 0x0000000008dd5000 )

Finished writing tables to disk in 153.17 seconds. Finished plotting in 1598.40 seconds (26.64 minutes).

So some link in the GUI is broken.

harold-b commented 2 years ago

So it is running fine from the cli then?

Drhicom commented 2 years ago

I have version 1.2.4 running from the cli, but the whole issue is running it through the GUI, so what link is broken in the GUI. What version is being placed in the GUI of the the beta releases of chia?

So I take it nobody has looked at the GUI? Running this from the cli defeats having it in the GUI correct? When is the next beta release going to happen?