my linux workstation die when I run infernece.py

ssifood commented 8 months ago

my workstation : 4090 ti * 2ea, Ubuntu 22.04 LTS cuda : 12.3 (Build cuda_12.3.r12.3/compiler.33492891_0)

when I run inference.py

my monitor is blak screen and remote terminal is disconnected.

so i run inference.py on my local monitor. Also, my monitor is blacked.

Do you know how to solve this problem? or why happen this situation?

huyiming2018 commented 8 months ago

Is there any other error information that can be provided?

It seems like there is insufficient CPU or memory when loading the model.

sriramsowmithri9807 commented 8 months ago

my workstation : 4090 ti * 2ea, Ubuntu 22.04 LTS cuda : 12.3 (Build cuda_12.3.r12.3/compiler.33492891_0)

when I run inference.py

my monitor is blak screen and remote terminal is disconnected.

so i run inference.py on my local monitor. Also, my monitor is blacked.

Do you know how to solve this problem? or why happen this situation?

Solution for this problem:::

It sounds like running inference.py might be causing your system to hang or encounter some issues, possibly related to the GPU processing or CUDA utilization. A few potential reasons and solutions could be:

GPU Overload: Running inference using two 4090 Ti GPUs might overload the system, causing it to hang or crash. Try limiting the number of GPUs utilized for inference or optimizing the code to distribute the load better across GPUs.
Memory Issues: The GPUs might be running out of memory during the inference process. Check if the model or the input data size exceeds the available GPU memory. Consider reducing batch sizes or optimizing memory usage.
Driver or CUDA Compatibility: Ensure that the CUDA version you're using is fully compatible with your GPU drivers. Mismatched versions can sometimes cause issues. Double-check the compatibility matrix for CUDA 12.3 and your specific GPU model.
Software Bugs: There might be bugs in the inference.py script or the libraries it's using. Look for any error messages or logs that might give clues about where the issue is occurring.
Power Supply: With powerful GPUs like the 4090 Ti, ensure that your power supply is adequate to handle the power demands of these cards when under load.

To start troubleshooting:

Check system logs for any error messages or warnings related to the crash.
Consider running the nvidia-smi command to monitor GPU usage and memory while running inference.py.
Review the inference.py script for any potential issues related to GPU usage, memory allocation, or any known bugs in the code.

If the issue persists, providing more details about the inference.py script or any error messages/logs would be helpful in identifying the specific cause of the problem.

FOR MORE INFORMATION sowmithrisriram7@gmail.com

sriramsowmithri9807 commented 8 months ago

@ssifood

Solution for your issue:::

It sounds like running inference.py might be causing your system to hang or encounter some issues, possibly related to the GPU processing or CUDA utilization. A few potential reasons and solutions could be:

GPU Overload: Running inference using two 4090 Ti GPUs might overload the system, causing it to hang or crash. Try limiting the number of GPUs utilized for inference or optimizing the code to distribute the load better across GPUs.
Memory Issues: The GPUs might be running out of memory during the inference process. Check if the model or the input data size exceeds the available GPU memory. Consider reducing batch sizes or optimizing memory usage.
Driver or CUDA Compatibility: Ensure that the CUDA version you're using is fully compatible with your GPU drivers. Mismatched versions can sometimes cause issues. Double-check the compatibility matrix for CUDA 12.3 and your specific GPU model.
Software Bugs: There might be bugs in the inference.py script or the libraries it's using. Look for any error messages or logs that might give clues about where the issue is occurring.
Power Supply: With powerful GPUs like the 4090 Ti, ensure that your power supply is adequate to handle the power demands of these cards when under load.

To start troubleshooting:

Check system logs for any error messages or warnings related to the crash.
Consider running the nvidia-smi command to monitor GPU usage and memory while running inference.py.
Review the inference.py script for any potential issues related to GPU usage, memory allocation, or any known bugs in the code.

If the issue persists, providing more details about the inference.py script or any error messages/logs would be helpful in identifying the specific cause of the problem.

FOR MORE INFORMATION sowmithrisriram7@gmail.com

ssifood commented 7 months ago

Is there any other error information that can be provided?

It seems like there is insufficient CPU or memory when loading the model.

I can't see error.. just blacked screen

er-muyue commented 5 months ago

Hi, we are closing this issue due to the inactivity. Hope your question has been resolved. If you have any further concerns, please feel free to re-open it or open a new issue. Thanks!

Meituan-AutoML / MobileVLM

my linux workstation die when I run infernece.py #3