b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
MIT License
1.4k stars 94 forks source link

converter.py OOM while converting llama-2-7b weights on my Raspberryi Pi 5 #4

Closed segabor closed 8 months ago

segabor commented 8 months ago

What's the memory requirement of the weight converter? Apparently it doesn't fit into 8 GB RAM (swapfile not enabled).

converter killed rpi5 htop
b4rtaz commented 8 months ago

Hello @segabor!

I would recommend converting the weights on a more powerful machine and then copying the converted file to the Raspberry Pi.

I suspect that the minimum is 13 GB of RAM for 7B.

segabor commented 8 months ago

Thanks @b4rtaz , I'll retry on my MBP.