Chia-Network / chia-blockchain

Chia blockchain python implementation (full node, farmer, harvester, timelord, and wallet)
Apache License 2.0
10.82k stars 2.03k forks source link

[BUG] Plotting process reaches 100% but doesn't complete #8533

Closed boner-cmd closed 2 years ago

boner-cmd commented 3 years ago

Describe the bug While plotting a k34, the plot finished but its temp suffix was never removed and it never started farming. Closing the GUI and all processes allowed write access to the plot. Removing the temp suffix and restarting the GUI had the plot begin farming as normal. However, user intervention is not required under normal operation.

End of log to prove plot was completed successfully:

Finished writing C1 and C3 tables
Writing C2 table
Finished writing C2 table
Final table pointers:
P1: 0xfc
P2: 0xe9ee29e92
P3: 0x1c723436b8
P4: 0x2a60889c20
P5: 0x3893bb8f22
P6: 0x4780ecaabe
P7: 0x58c45a0940
C1: 0x6a44c8ea40
C2: 0x6a454bfff2
C3: 0x6a454c0353

Time for phase 4 = 44482.741 seconds. CPU (13.110%) Sun Sep 19 08:33:48 2021
Approximate working space used (without final file): 1176.467 GiB
Final File size: 429.883 GiB

Plot was already moved to the finished directory, but still had the "temp" part of the extension appended, and the plotter UI was stuck at 100%, rather than completed and automatically started farming.

To Reproduce

Steps to reproduce the behavior:

  1. Plot a new k34 using one directory as temp and a different directory as finished
  2. Allow plotting to reach 100%
  3. Check plotting log to verify plot is complete
  4. Plot stuck at 100% and plotting process completed, but temp suffix never removed and plot never starts farming

Expected behavior Plot completes, moves to finished directory, removes .2.temp suffix, and automatically starts farming

Screenshots

Desktop

Additional context Wallet and node also stopped syncing at or around this time. Unsure if related. Logs from the relevant time period don't show anything unusual:

2021-09-18T14:32:28.347 full_node full_node_server : WARNING Banning 220.141.196.206 for 10 seconds 2021-09-18T18:54:19.917 daemon websockets.protocol : WARNING Unexpected exception in keepalive ping task Traceback (most recent call last): File "websockets\protocol.py", line 1130, in keepalive_ping File "websockets\protocol.py", line 751, in ping KeyError: b'\xfd\xe1\xa0\r' 2021-09-18T20:21:21.772 full_node chia.full_node.full_node: ERROR got weight proof request for unknown peak 73ec76b9ee3662817709d80bdaf96841a9eb7aa111b9ba268a0efa6e1c314ae2 2021-09-19T01:47:22.143 full_node full_node_server : ERROR Exception: Error short batch syncing, could not fetch block at height 879802, {'host': '220.141.196.206', 'port': 16664}. Traceback (most recent call last): File "chia\server\server.py", line 553, in wrapped_coroutine File "chia\full_node\full_node_api.py", line 106, in new_peak File "chia\full_node\full_node.py", line 411, in new_peak File "chia\full_node\full_node.py", line 247, in short_sync_batch ValueError: Error short batch syncing, could not fetch block at height 879802

2021-09-19T01:47:22.150 full_node full_node_server : ERROR Exception: Error short batch syncing, could not fetch block at height 879802 <class 'ValueError'>, closing connection {'host': '220.141.196.206', 'port': 16664}. Traceback (most recent call last): File "chia\server\server.py", line 563, in api_call File "asyncio\tasks.py", line 442, in wait_for File "chia\server\server.py", line 560, in wrapped_coroutine File "chia\server\server.py", line 553, in wrapped_coroutine File "chia\full_node\full_node_api.py", line 106, in new_peak File "chia\full_node\full_node.py", line 411, in new_peak File "chia\full_node\full_node.py", line 247, in short_sync_batch ValueError: Error short batch syncing, could not fetch block at height 879802

2021-09-19T01:47:22.152 full_node full_node_server : WARNING Banning 220.141.196.206 for 10 seconds 2021-09-19T02:26:11.681 harvester chia.harvester.harvester: WARNING Looking up qualities on G:\finished\plot-k34-2021-09-05-15-11-7e9716691434b7f04804d0fe3748e191bf7a307a7146c171767b6d8f624bf8ee.plot took: 6.756035327911377. This should be below 5 seconds to minimize risk of losing rewards. 2021-09-19T07:25:47.071 full_node full_node_server : WARNING Banning 221.230.53.25 for 10 seconds 2021-09-19T07:25:47.192 full_node full_node_server : WARNING Invalid handshake with peer. Maybe the peer is running old software. 2021-09-19T09:37:31.317 harvester chia.harvester.harvester: WARNING Looking up qualities on G:\finished\plot-k34-2021-09-05-15-11-7e9716691434b7f04804d0fe3748e191bf7a307a7146c171767b6d8f624bf8ee.plot took: 6.842954397201538. This should be below 5 seconds to minimize risk of losing rewards. 2021-09-19T09:53:16.399 harvester chia.harvester.harvester: WARNING Looking up qualities on G:\finished\plot-k34-2021-09-05-15-11-7e9716691434b7f04804d0fe3748e191bf7a307a7146c171767b6d8f624bf8ee.plot took: 14.142029762268066. This should be below 5 seconds to minimize risk of losing rewards.

loppefaaret commented 3 years ago

if you check the disk activity as the log is at that point, you should see it showing the file in the process of being copied into -d the plotting process is not done until you get:

Total time = 23217.990 seconds. CPU (138.290%) Wed Sep 15 13:59:41 2021
Renamed final file from "/path/to/tmp/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot.2.tmp" to "/path/to/plots/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot"

in my case, im not copying, but rename in place, so mine is almost isntant - whereas if you have it set to copy from tmp working directory, to another final directory, it would spend 40-80 minutes copying, after it has reached 100% the plotting process itself is done, but it is still working on moving the plot to the final location.

the time to copy into place, might be greatly exaggerated if you are using 2 different folders, on the same drive! it could easily be several hours if on the same HDD

noting your CPU (13.110%) reported, i think this plot might have been done on a regular mechanical HDD - was this the same drive that also holds the Chia install, and the blockchain and wallet databases ? you might be saturating the drive, so that they experienced timeouts during their need to verify data

boner-cmd commented 3 years ago

This is a really interesting idea, but I don't think it quite fits. In my case, the entire plot file is already in the destination but simply hasn't been renamed. Also, if it helps, both directories are in the same place (G:\temp and G:\finished) so the move itself is pretty trivial.

You may be on to something with the verification component, but I would think if the move operation completed successfully (and did, since I am harvesting the manually-renamed plot just fine) then the copy delay would not be the problem.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: loppefaaret @.> Sent: Monday, September 20, 2021 10:54:55 AM To: Chia-Network/chia-blockchain @.> Cc: Dimelo Waterson @.>; Author @.> Subject: Re: [Chia-Network/chia-blockchain] [BUG] Plotting process reaches 100% but doesn't complete (#8533)

if you check the disk activity as the log is at that point, you should see it showing the file in the process of being copied into -d the plotting process is not done into you get:

Total time = 23217.990 seconds. CPU (138.290%) Wed Sep 15 13:59:41 2021 Renamed final file from "/path/to/tmp/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot.2.tmp" to "/path/to/plots/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot"

in my case, im not copying, but rename in place, so mine is almost isntant - whereas if you have it set to copy from tmp working directory, to another final directory, it would spend 10-20 minutes copying, after it has reached 100% the plotting process itself is done, but it is still working on moving the plot to the final location.

noting your CPU (13.110%) reported, i think this plot might have been done on a regular mechanical HDD - was this the same drive that also holds the Chia install, and the blockchain and wallet databases ? you might be saturating the drive, so that they experienced timeouts during their need to verify data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Chia-Network/chia-blockchain/issues/8533#issuecomment-923002847, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM4UEJBVEVFFSIIP5SN3AWDUC5DL7ANCNFSM5EKQHSQQ.

boner-cmd commented 3 years ago

Also sorry I forgot to mention but you are correct, it's a spanned drive comprised of two mechanical HDDs and while the install is not on that drive, the databases are.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Andrew Buonora @.> Sent: Monday, September 20, 2021 1:23:26 PM To: Chia-Network/chia-blockchain @.> Subject: Re: [Chia-Network/chia-blockchain] [BUG] Plotting process reaches 100% but doesn't complete (#8533)

This is a really interesting idea, but I don't think it quite fits. In my case, the entire plot file is already in the destination but simply hasn't been renamed. Also, if it helps, both directories are in the same place (G:\temp and G:\finished) so the move itself is pretty trivial.

You may be on to something with the verification component, but I would think if the move operation completed successfully (and did, since I am harvesting the manually-renamed plot just fine) then the copy delay would not be the problem.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: loppefaaret @.> Sent: Monday, September 20, 2021 10:54:55 AM To: Chia-Network/chia-blockchain @.> Cc: Dimelo Waterson @.>; Author @.> Subject: Re: [Chia-Network/chia-blockchain] [BUG] Plotting process reaches 100% but doesn't complete (#8533)

if you check the disk activity as the log is at that point, you should see it showing the file in the process of being copied into -d the plotting process is not done into you get:

Total time = 23217.990 seconds. CPU (138.290%) Wed Sep 15 13:59:41 2021 Renamed final file from "/path/to/tmp/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot.2.tmp" to "/path/to/plots/plot-k32-2021-09-09-11-11-9vswj56w938nb9wb6w3.plot"

in my case, im not copying, but rename in place, so mine is almost isntant - whereas if you have it set to copy from tmp working directory, to another final directory, it would spend 10-20 minutes copying, after it has reached 100% the plotting process itself is done, but it is still working on moving the plot to the final location.

noting your CPU (13.110%) reported, i think this plot might have been done on a regular mechanical HDD - was this the same drive that also holds the Chia install, and the blockchain and wallet databases ? you might be saturating the drive, so that they experienced timeouts during their need to verify data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Chia-Network/chia-blockchain/issues/8533#issuecomment-923002847, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM4UEJBVEVFFSIIP5SN3AWDUC5DL7ANCNFSM5EKQHSQQ.

loppefaaret commented 3 years ago

nope, different directories equals to copying, even if on the same drive - i'd suggest you set the plotter to use the same folder, and do the move manually

boner-cmd commented 3 years ago

I can do that as a test, but again remember, the move itself is completed. The plot was found in the finished folder, fully moved, just not renamed. So hang occurs before logs would post "total time spent" but after the actual move itself completes, in this case by a couple hours. Last phase time stamp was 08:33, didn't identify the hang and check logs until nearly 14:00 iirc.

boner-cmd commented 3 years ago

nope, different directories equals to copying, even if on the same drive

I was under the impression that moving on the same drive would just change the index of the object. Is this not the case?

loppefaaret commented 3 years ago

I can do that as a test, but again remember, the move itself is completed. The plot was found in the finished folder, fully moved, just not renamed. So hang occurs before logs would post "total time spent" but after the actual move itself completes, in this case by a couple hours. Last phase time stamp was 08:33, didn't identify the hang and check logs until nearly 14:00 iirc.

i would follow that up with a chia plots check -g D:\path\to\plot\plot-k34-2021-09-09-11-11-98vw35bnbwn98.plot or whatever the filename is... it is possible that you interupted the copy, and even though the size is reported (it would allocate the space prior) it might not have the actual data

I was under the impression that moving on the same drive would just change the index of the object. Is this not the case?

it is under normally OS operation, but chia is dumb, and have to rely on not knowing what folder resides on what drive, so if the folders are different, it will do a copy, rather than a rename

boner-cmd commented 3 years ago

Good context, thank you. I will check the plot and report back to confirm.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: loppefaaret @.> Sent: Tuesday, September 21, 2021 5:49:24 PM To: Chia-Network/chia-blockchain @.> Cc: Dimelo Waterson @.>; Author @.> Subject: Re: [Chia-Network/chia-blockchain] [BUG] Plotting process reaches 100% but doesn't complete (#8533)

I can do that as a test, but again remember, the move itself is completed. The plot was found in the finished folder, fully moved, just not renamed. So hang occurs before logs would post "total time spent" but after the actual move itself completes, in this case by a couple hours. Last phase time stamp was 08:33, didn't identify the hang and check logs until nearly 14:00 iirc.

i would follow that up with a chia plots check -g D:\path\to\plot\plot-k34-2021-09-09-11-11-98vw35bnbwn98.plot or whatever the filename is... it is possible that you interupted the copy, and even though the size is reported (it would allocate the space prior) it might not have the actual data

I was under the impression that moving on the same drive would just change the index of the object. Is this not the case?

it is under normally OS operation, but chia is dumb, and have to rely on not knowing what folder resides on what drive, so if the folders are different, it will do a copy, rather than a rename

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Chia-Network/chia-blockchain/issues/8533#issuecomment-924411788, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM4UEJFZ74KYNFAT37K77LTUDD4WJANCNFSM5EKQHSQQ.

boner-cmd commented 3 years ago

Log confirms plot produces errors while trying to farm. Good call.

spamkutu commented 3 years ago

yea do well use that tric nice.

github-actions[bot] commented 3 years ago

This issue has been flagged as stale, as there has been no activity on it in the last 14 days. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it open.

github-actions[bot] commented 2 years ago

This issue was automatically closed because it has been flagged as stale, and subsequently passed 7 days with no further activity from the submitter or watchers.