getumbrel / llama-gpt

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
https://apps.umbrel.com/app/llama-gpt
MIT License
10.73k stars 696 forks source link

Question about hardware requirements #102

Open cdmoss opened 1 year ago

cdmoss commented 1 year ago

Hello - excellent project, I'm super excited about any option for decentralized AI.

I'm extremely green to this domain - I was hoping someone could help me understand GPU requirements for deploying this. I see many other in the issues boasting 4090s, dual 3080s, etc. - I'm wondering if it's viable for me to try using the 70b with my paltry 6700xt and ryzen 3700x (or what the max model to use with no GPU at all if any, and what the state of AMD support is). I recognize all this info is out there - I'd greatly appreciate links to resources for my own research.

stratus-ss commented 1 year ago

I can't speak to the state of support with AMD is, but currently for GPU usage, CUDA is required. Largely I believe the larger number of cuda cores has a pretty large impact on performance (as does the amount of ram in each card).

As for the 70b, I am running that as a test on a i5-9400k with 64G of ram and no GPU support. Its fairly slow, especially compared to running the 13b model. I am running a 13b model with decent performance in a vm on top of a ryzen 5 3600x, so from that perspective you should be good.

hoelee commented 1 year ago

I run 34b model with 13900k CPU only, 32 threads 100% running still taking minutes in order to get full reply. I'm not sure if this speed is what I was expecting.

istler commented 1 year ago

Feels like 34b is out of reach for mere mortals.