cocktailpeanut / dalai

The simplest way to run LLaMA on your local machine
https://cocktailpeanut.github.io/dalai
13.09k stars 1.42k forks source link

llama does work while alpaca does not (bad magic error) #434

Open suoko opened 1 year ago

suoko commented 1 year ago

If i try to install alpaca instead of llama, I get a bad magic error when running it. While using llama.cpp from this repo https://github.com/ggerganov/llama.cpp/releases it will work by using the same alpaca models file with no issue. I'm trying to make llama.cpp work from the browser now

theproj3ct commented 1 year ago

same for me

zephyrprime commented 1 year ago

Apparently they now "Avoid unnecessary bit shuffling by packing the quants in a better way. Requires model re-quantization". So that's the problem.

thestumonkey commented 1 year ago

How do we re-quantize it?

zephyrprime commented 1 year ago

Beats me. I don't think it's really desirable to do that either because you would have to basically downgrade the encoding of every future version of llama or alpaca models to keep it working with the codebase. I switched to a different codebase of llama I found on github that had been kept up to date and got it to work.

On Tue, May 9, 2023 at 4:40 AM Stuart Alexander @.***> wrote:

How do we re-quantize it?

— Reply to this email directly, view it on GitHub https://github.com/cocktailpeanut/dalai/issues/434#issuecomment-1539677089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFC2UWODMQKAGJODODAUO23XFIGKJANCNFSM6AAAAAAXX3DSYQ . You are receiving this because you commented.Message ID: @.***>

thorsteinssonh commented 1 year ago

It seems like its going to be some effort for this api project and more to chase and maintain compatibility/support for this academic style code. They probably don't have portability and consistency in mind when doing research.

mirek190 commented 1 year ago

go to llama.ccp or koboldccp .... this project is to obsolete ...

junxian428 commented 1 year ago

bad magic i also encountered but i change the model into LLaMa so not problem i guess model problem .... image

mirek190 commented 1 year ago

1200ms for 7b model token ? Wow ... I have 400ms token with 70b / 65b models ...but using llamacpp

For 7b models I have 14 ms /token