evilsocket / cake

Distributed LLM and StableDiffusion inference for mobile, desktop and server.
Other
2.44k stars 127 forks source link

Standardized Android client #14

Open TriDefender opened 1 month ago

TriDefender commented 1 month ago

Would it be possible to build a Android client so you can enter the IP of the master node so you can start inference directly? Also, new SoCs of Qualcomm have built-in NPUs, would it be possible to utilise these dedicated hardware?

PylotLight commented 1 month ago

Would it be possible to build a Android client so you can enter the IP of the master node so you can start inference directly? Also, new SoCs of Qualcomm have built-in NPUs, would it be possible to utilise these dedicated hardware?

I would wonder about the google tensor chips, and intel/amd NPUs as well for hwa.

evilsocket commented 1 month ago

Android: yes, entirely possible, especially if somebody helps with it :)

Qualcomm: theoretically possible, but would require a dedicated inference backend, which would be very complex to write.

TriDefender commented 1 month ago

Would it be possible to build a Android client so you can enter the IP of the master node so you can start inference directly? Also, new SoCs of Qualcomm have built-in NPUs, would it be possible to utilise these dedicated hardware?

I would wonder about the google tensor chips, and intel/amd NPUs as well for hwa.

For Intel, they have something called IPEX-LLM, where you can use cpu/igpu/dgpu/npu, as long as they are all intel devices. Basically you send the model to "xpu" then it will be processed by the designated device. I just don't know if they can processes sliced networks.