Chia-Network / chia-blockchain

Chia blockchain python implementation (full node, farmer, harvester, timelord, and wallet)
Apache License 2.0
10.82k stars 2.03k forks source link

[Bug] Setting GPU index or enforce_gpu_index causes unexpected behavior #18646

Closed LeroyINC closed 3 weeks ago

LeroyINC commented 1 month ago

What happened?

Have a system with 3 different GPU card in it of all different models. When i set a few options in the config file i get some unexpected behavior.

if i set the following enforce_gpu_index: true things just start to act strange.. wallet goes out of sync every 10 seconds and resyncs.. Chia GUI keeps going blank and reconnecting. Number of plots does not display properly. and so on.. there is also a bunch of disconnect entries in the debug.log file for the harvester dropping connection to the farmer and such. allot seems to break.

also.. if i set the following gpu_index: 1 (or to anything other than 0) things seem to work fine... but on the default GPU there seems to be an artifact process that is loaded but never seems to do anything. (see screenshot)

I am running on Ubuntu 24.04 with all roles on the same machine.

Screenshot 2024-09-26 132013

Version

2.4.3

What platform are you using?

Linux

What ui mode are you using?

GUI

Relevant log output

No response

BrandtH22 commented 1 month ago

Hey @LeroyINC , This is a known issue where the index used by bladebit (greenreaper) is a different index than is listed in the nvidia-smi report.

Unfortunately since the majority of our lead bladebit developers time is focused on the new plot format there are no current plans to update the codebase and to determine which GPU index needs to be used in the chia software one must use trial and error (set an index, see which GPU runs the process and note that on your side. Repeat for all GPU indexes to ensure the correct ones are being used)

Also for the ghost process issue I recommend rebooting the machine to clear them.

LeroyINC commented 1 month ago

i know about the Bladebit plotting issue and that is not a big deal.. it still works

but my original post is not a plotting issue... but a farming issue with the Chia software. The strange behavior happens when farming. Setting the value enforce_gpu_index: true - make the machine not able to even farm.

BrandtH22 commented 1 month ago

Hey @LeroyINC , the gpu index mismatch effects both plotting and farming so if the index is not set properly then the enforce option will not work properly either.

Can you try cycling through the gpu indexes fully stopping / verifying all processes are stopped / then starting chia during each test of the index?

LeroyINC commented 1 month ago

ok did some testing..

when i stop the harvester using "chia stop harvester" then the process stops and gets cleared off all GPU's

when i start the harvester using any GPU index that's available other than 0 then a phantom process always starts on GPU 0 -- see above screen shot.

when i stop the harvester then all process go away including the phantom one on GPU 0

BrandtH22 commented 1 month ago

Hey @LeroyINC , thank you for the troubleshooting and information.

Would you be able to provide some more verbose logging for the issues on the harvester?

Note - if you are setting logs to debug mode with chia configure --log-level DEBUG we first recommend removing your passphrase as the passphrase is printing in plain text in one of the debug logs

Thank you for the information and additional logs!

github-actions[bot] commented 1 month ago

This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.

github-actions[bot] commented 3 weeks ago

This issue was automatically closed because it has been flagged as stale, and subsequently passed 7 days with no further activity from the submitter or watchers.