b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

convert into .bin #78

Closed fabgat closed 1 month ago

fabgat commented 1 month ago

Hi, looking at your great project, I see that the model nust be in .bin format, but following your instruction, the convert-llama.py create a ".m" file and not ".bin".

Do I am missing some step?

Cheers

b4rtaz commented 1 month ago

Hello @fabgat, from last few versions Distributed Llama converts models to .m format. But only the extension has changed. The binary format is the same. You can still use .bin models.

fabgat commented 1 month ago

Thanks.