Closed artistlu closed 3 months ago
Claude Sonnet 3.5 response:
The issue is indeed that the code is trying to use NVIDIA-specific libraries on a non-NVIDIA system. The code needs modification to handle ARM64 platforms with Mali GPUs correctly. Here's how we can address this:
First, we need to modify the linux_device_capabilities()
function in exo/topology/device_capabilities.py
to detect and handle Mali GPUs. We'll need to add a check for Mali GPUs before trying to use NVIDIA libraries.
We'll use the pyudev
library to detect the GPU. If you don't have it installed, you can install it with pip install pyudev
.
Here's an updated version of the linux_device_capabilities()
function:
import psutil
from tinygrad import Device
import pyudev
def linux_device_capabilities() -> DeviceCapabilities:
if DEBUG >= 2: print(f"tinygrad {Device.DEFAULT=}")
context = pyudev.Context()
# Check for Mali GPU
mali_gpu = next((device for device in context.list_devices(subsystem='mali0') if 'mali' in device.sys_name.lower()), None)
if mali_gpu:
gpu_name = mali_gpu.get('DEVNAME', 'Unknown Mali GPU')
return DeviceCapabilities(
model=f"Linux Box (ARM Mali)",
chip=gpu_name,
memory=psutil.virtual_memory().total // 2**20,
flops=DeviceFlops(fp32=100, fp16=200, int8=400) # Placeholder values, adjust as needed
)
# Check for NVIDIA GPU
try:
import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
gpu_name = pynvml.nvmlDeviceGetName(handle)
gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
if DEBUG >= 2: print(f"NVIDIA device {gpu_name=} {gpu_memory_info=}")
return DeviceCapabilities(
model=f"Linux Box ({gpu_name})",
chip=gpu_name,
memory=gpu_memory_info.total // 2**20,
flops=CHIP_FLOPS.get(gpu_name, DeviceFlops(fp32=0, fp16=0, int8=0))
)
except ImportError:
pass # NVIDIA libraries not available
except pynvml.NVMLError:
pass # NVIDIA GPU not found or error accessing it
# Fallback for other devices
return DeviceCapabilities(
model=f"Linux Box (Device: {Device.DEFAULT})",
chip=f"Unknown Chip (Device: {Device.DEFAULT})",
memory=psutil.virtual_memory().total // 2**20,
flops=DeviceFlops(fp32=0, fp16=0, int8=0)
)
This updated function will:
You'll need to add import pyudev
at the top of the file.
Also, note that the FLOPS values for the Mali GPU are placeholders. You might want to adjust these based on the specific Mali GPU model you're using, or implement a more sophisticated detection method for Mali GPU capabilities.
This change should allow the Exo project to run on your ARM64 platform with a Mali GPU without throwing the NVIDIA-related error. It will detect the Mali GPU and provide some basic information about it.
Remember to install pyudev:
pip install pyudev
After making these changes, the project should be able to start without the NVIDIA error on your ARM64 platform with a Mali GPU.
The issue has been resolved. Could you please merge the changes? Thank you. @stephanj
When trying to run the Exo project on an ARM64 platform with a Mali GPU, I am encountering the following error:
The error message indicates that the Exo project is attempting to use the NVIDIA Management Library (NVML), which is not available on ARM64 platforms with Mali GPUs.