containers / podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers
https://podman-desktop.io/extensions/ai-lab
Apache License 2.0
168 stars 30 forks source link

Estimating model offloading capabilities #1442

Open axel7083 opened 1 month ago

axel7083 commented 1 month ago

Is your feature request related to a problem? Please describe

We removed the misleading indicator CPU on the Model tables. But it would be interesting for the user to have some indication if the Model could run on the GPU or not.

Describe the solution you'd like

We would probably need to check for a few elements:

Describe alternatives you've considered

No response

Additional context

No response

benoitf commented 1 month ago

Can we use the gpu ? (libkrun, WSL nvidia) should be a different indicator for "I have GPUs on my computer that could run it but not available in my podman machine" vs "no GPU is available" vs "GPU is available within the podman machine"

axel7083 commented 1 month ago

Can we use the gpu ? (libkrun, WSL nvidia) should be a different indicator for "I have GPUs on my computer that could run it but not available in my podman machine" vs "no GPU is available" vs "GPU is available within the podman machine"

  • also if multiple GPUs, should it say on which one it's possible ?

We have zero support of this behaviour, therefore I would not expose this, as we have no way to make it work

benoitf commented 1 month ago

as we have no way to make it work

could you explain why we have no way ?

Listing GPUs from the local machine and from the podman machine is possible using some commands AFAIK

axel7083 commented 1 month ago

could you explain why we have no way ?

llamacpp do support it^1, but we were never able to try it out. So we have no idea how we would have to mount the devices or specify which one to use.

Listing GPUs from the local machine and from the podman machine is possible using some commands AFAIK

Yes we are using the systeminformation npm package to get information about the GPUs available, it return an array, however we only uses the first one.

https://github.com/containers/podman-desktop-extension-ai-lab/blob/9ded94378f9cefe00130aa52212b0136f5e06180/packages/backend/src/workers/provider/LlamaCppPython.ts#L178-L184

This is simply because we can't test it