Windows reports GPU Inference, but uses CPU

containers / podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

https://podman-desktop.io/extensions/ai-lab

Apache License 2.0

168 stars 30 forks source link

Windows reports GPU Inference, but uses CPU #1588

Closed evanshortiss closed 2 weeks ago

evanshortiss commented 3 weeks ago

Bug description

When running a Service/Playground on Windows 11, the UI reports GPU inference, but is actually using the CPU.

I am running Podman 1.12, Podman to 5.2.0, latest AI Lab (1.2.3) extension, and enabled GPU Inference for the extension. It works fine on macOS.

Screenshot 2024-08-20 105134 Screenshot 2024-08-20 105600 Screenshot 2024-08-20 105703

Operating system

Windows 11

Installation Method

Installer from website/GitHub releases

Version

1.12.0

Steps to reproduce

Start with a Windows 11 environment that has no existing Podman Deskop installation.
Install Podman Desktop 1.12.x.
Follow instructions to configure WSL, etc.
Follow documentation and verification steps for Windows GPU access and nvidia CTK (https://podman-desktop.io/docs/podman/gpu)
Create a Service in the AI Lab extension UI.

At this point you can interact with the model. GPU inference is reported in the Podman Desktop UI, but CPU is being used with GPU idle.

Relevant log output

No response

Additional context

No response

axel7083 commented 2 weeks ago

Hi @evanshortiss thanks for the report, could you once the inference server is started open the container logs

You can access the corresponding container details page by clicking on the status icon from AI Lab

Ideally, could you try running the nvidia-smi from inside the container and provide the output ?

Finally, could you also provide the content of the Inspect tab of the corresponding container.

Thanks !

axel7083 commented 2 weeks ago

Okey I tried the latest image on windows and I am not able to uses the GPU as well. The image used has been introduced by https://github.com/containers/podman-desktop-extension-ai-lab/pull/1558. It is ai-lab-playground-chat-cuda.

The previous image was llamacpp_python_cuda which was really old and last published 3 months ago; however this is because this image is deprecated and replaced by llamacpp-python-cuda with - instead of _ which has been updated 17 days ago.

Here are the result of the tests compiled

Image	GPU enabled
(current) ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat-cuda@sha256:023e07b729aef9d91e75f8bd57f92b3291670bc362e3aed79bff0cd050074eef	🔴
ghcr.io/containers/llamacpp-python-cuda@sha256:c81947504e7e5dcaa106844dd6672c9106b18c619fc4bb727211eb7fb1fe36d7	🟢

evanshortiss commented 3 days ago

Thanks for looking into this @axel7083 and apologies for the delay on my end. I was on vacation!

axel7083 commented 3 days ago

Thanks for looking into this @axel7083 and apologies for the delay on my end. I was on vacation!

Np, thanks for reporting !