crystian / ComfyUI-Crystools

A powerful set of tools for ComfyUI
MIT License
735 stars 39 forks source link

eGPU RTX2080TI freezes screen and crashes PC (sometimes) #30

Closed tobiaswuerth closed 8 months ago

tobiaswuerth commented 8 months ago

Describe the bug
Randomly my screen freezes (indicated by the fact that I cannot move my mouse anymore), stays that way 1-3s and then completly instantly turns off the PC before restarting.

To Reproduce
Don't know.

Expected behavior
Don't crash.

Screenshots

Error in console:

Crystools ERROR] Could not get GPU utilization.GPU is lost
[Crystools ERROR] Monitor of GPU is turning off (not on UI!)
Exception in thread Thread-5 (startMonitorLoop):
Traceback (most recent call last):
  File "threading.py", line 1045, in _bootstrap_inner
  File "threading.py", line 982, in run
  File "D:\_projects\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Crystools\general\monitor.py", line 31, in startMonitorLoop
    asyncio.run(self.MonitorLoop())
  File "asyncio\runners.py", line 190, in run
  File "asyncio\runners.py", line 118, in run
  File "asyncio\base_events.py", line 653, in run_until_complete
  File "D:\_projects\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Crystools\general\monitor.py", line 35, in MonitorLoop
    data = self.hardwareInfo.getStatus()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\_projects\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Crystools\general\hardware.py", line 78, in getStatus
    getStatus = self.GPUInfo.getStatus()
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\_projects\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Crystools\general\gpu.py", line 123, in getStatus
    memory = pynvml.nvmlDeviceGetMemoryInfo(deviceHandle)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\_projects\ComfyUI_windows_portable\python_embeded\Lib\site-packages\pynvml\nvml.py", line 2440, in nvmlDeviceGetMemoryInfo
    _nvmlCheckReturn(ret)
  File "D:\_projects\ComfyUI_windows_portable\python_embeded\Lib\site-packages\pynvml\nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError: Unknown Error

Versions:

Python version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
Total VRAM 11264 MB, total RAM 32602 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2080 Ti : cudaMallocAsync
[Crystools INFO] Crystools version: 1.11.0
[Crystools INFO] CPU: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz - Arch: AMD64 - OS: Windows 10
[Crystools INFO] GPU/s:
[Crystools INFO] 0) NVIDIA GeForce RTX 2080 Ti
[Crystools INFO] NVIDIA Driver: 546.33
### Loading: ComfyUI-Manager (V2.3.1)
### ComfyUI Revision: 1930 [d1533d9c] | Released on '2024-01-24'

Additional context
I noticed that the issue is caused by the GPU. Now, when the screen freezes during the 1-3s before crashing I can unplug the Thunderbolt cable to the eGPU and stop the PC from crashing. Also, the PC never freezes/crashes when your extension is not installed.

Not sure if you can do something about this, it's not a lot. Figured might as well let you know. Btw, love the extension, really useful.

crystian commented 8 months ago

Thanks for the report!

I can't do anything with this message:

pynvml.nvml.NVMLError: Unknown Error

:(

And yes, it is from the driver/GPU; for some reason, fail to get information, so I blocked the monitor of this resource.

It's so strange, but something similar happened to a laptop when it works with a battery. Are you always connected to the power?

@tobiaswuerth

tobiaswuerth commented 8 months ago

I know.. It's not a lot to work with..

And yes, I'm always connected to power.