ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.42k stars 8.78k forks source link

WHY WHY WHY ????? #1572

Closed raymerjacque closed 1 year ago

raymerjacque commented 1 year ago

WHY did you guys end support for older Llama models ? why is backwards compatibility not added when you change formats ? This is what pisses me off about open source, its absolute fraken chaos, things get added and there is no standard that is kept. you guys do realise this project filters down to additional projects, so its not as simple as rolling back or using an earlier commit, the other projects now also have this b@llsh!t added to their githubs. what a fraken mess you guys have created with these latest updates.

youve killed support for half the models at huggingface... i am so angry right now.

Blucknote commented 1 year ago

image

SlyEcho commented 1 year ago

There is no ISO standard for the model file formats, there is no standard at all. There is no ML framework that produces ggml models, only converted from other formats which means they can always be converted again. Adding support for all models is a lot of effort to maintain.

Green-Sky commented 1 year ago

@raymerjacque If you don't like what is being built here, then don't use it. :)

LostRuins commented 1 year ago

@raymerjacque if it's any consolation, I am (for now) still maintaining backwards compatibility with all prior ggml models in my KoboldCpp fork, though you will not have CUDA support.

Josh-XT commented 1 year ago

WHY did you guys end support for older Llama models ? why is backwards compatibility not added when you change formats ? This is what pisses me off about open source, its absolute fraken chaos, things get added and there is no standard that is kept. you guys do realise this project filters down to additional projects, so its not as simple as rolling back or using an earlier commit, the other projects now also have this b@llsh!t added to their githubs. what a fraken mess you guys have created with these latest updates.

youve killed support for half the models at huggingface... i am so angry right now.

You can always like.. Use a previous version...

myeolinmalchi commented 1 year ago

"I want quality to improve, but I don't want updates to the model, but I want all the latest features."

Recognize that you're being contradictory and selfish.

Green-Sky commented 1 year ago

"And everything for free"

1980Dragon commented 1 year ago

Just get KoboldCPP. It is free, simple and powerful. It supports all of the past models too.

BarfingLemurs commented 1 year ago

https://github.com/ggerganov/llama.cpp/discussions/205

hey, I know this feeling, but we don't have to be lazy llama pirates, we can convert hf to ggml, and self-quantize, just need to raise your pagefile/swapfile

raymerjacque commented 1 year ago

actually i havnt used llama in a while, else i would have known about this issue sooner. it was my backup model and now it doesnt work. this is why i HATE open source... give me a paid version of anything and id be much happier knowing it wont be screwed with in unexpected ways... I think ill skip this project. ill just use Bard and Open AI, at least there i dont have to worry about things suddenly being broken without warning or explanation.

JohnnyOpcode commented 1 year ago

One word..

VELOCITY

JWNoctis commented 1 year ago

FWIW the following is generated with llama.cpp/Vicuna 13B V1.0 quantized to Q5_1, with parameters--temp 0.8 --top_k 30 --repeat_penalty 1.17647 --repeat_last_n 256 --top_p 0.8

### Human: Please rewrite the following Github issue report in a more respectable and professional tone, while still reflecting the same issue at hand.
---
Title: WHY WHY WHY ?????
Content: <OP>
### Assistant: I would like to bring to your attention an issue regarding the discontinuation of support for older Llama models and the lack of backwards compatibility when changes are made to file formats. This has caused significant confusion and frustration within our community, particularly given the impact it has on downstream projects that rely on these tools.

As an open-source contributor, I understand the challenges involved in maintaining consistency and standards across different projects. However, I believe that it is essential to ensure that any changes made to existing codebases do not cause unnecessary difficulties for users who may still be relying on previous versions.

I would appreciate your thoughts on how we can address this issue and provide better support for older Llama models going forward. Thank you for considering my feedback.
LostRuins commented 1 year ago

If we're shitposting now, here's WizardLM-30B-Uncensored.ggmlv3.q5_1.bin

image

maddes8cht commented 1 year ago

actually i havnt used llama in a while, else i would have known about this issue sooner. it was my backup model and now it doesnt work. this is why i HATE open source... give me a paid version of anything and id be much happier knowing it wont be screwed with in unexpected ways... I think ill skip this project. ill just use Bard and Open AI, at least there i dont have to worry about things suddenly being broken without warning or explanation.

Well, that's easy - you give me some money, i'll give you an earlier version of llama.cpp that is capable of running your requested model.

I will guarantee that i will never do any updates and that with this version, your model will always run.

P.S. i give you a discount of 20 % if you decide to buy within 48 hours.

SlyEcho commented 1 year ago

i HATE open source... give me a paid version of anything

You can pay if you want: https://github.com/sponsors/ggerganov

BarfingLemurs commented 1 year ago

actually i havnt used llama in a while, else i would have known about this issue sooner. it was my backup model and now it doesnt work. this is why i HATE open source... give me a paid version of anything and id be much happier knowing it wont be screwed with in unexpected ways... I think ill skip this project. ill just use Bard and Open AI, at least there i dont have to worry about things suddenly being broken without warning or explanation.

but those online products can change the seed values of your results and you cant rollback, and for this just type (an example) git checkout 9560655409dc80771a9b19e838ff47c5c1df6483

see the versions and dates here: https://github.com/ggerganov/llama.cpp/commits/master

once you know how to jump to a previous version = you can do it for all the github projects you come across.

Those products don't allow you to use finetune models+replicate the seed. Non-open source products like windows also release updates that change many things and break your programs in unexpected ways.

LostRuins commented 1 year ago

Actually windows is a bit of a counterexample. It has an absolutely arcane convoluted codebase, parts of which solely exists to ensure backwards compatibility with some decision made two decades ago. Which is why you have beautiful stuff like WoW64 allowing you to run decade old 32 bit programs on your current system.

I do wish they had kept NTVDM. How else are you going to run stuff like chips challenge? Sigh.

howard0su commented 1 year ago

Check PR #1504 which you can easily to convert your existing model to latest.

wtarreau commented 1 year ago

@JWNoctis that's very interesting, but could you please share your complete command line ? I'm still struggling to get vicuna to work as anything but a chat bot and it seems you managed to pass it instructions. But whatever I try constantly ends up with an empty response. I've pasted your instructions into a file loaded by -f, or using --prompt "$(cat thisfile.txt)", tried with/without --instruct, with/without --interactive or --interactive-first, and I never get any usable result. I know I'm missing something but previously I was convinced it was related to the models. Now that I'm seeing you're using the exact same as I am, I'm convinced the problem is somewhere else.

cmp-nct commented 1 year ago

I mean he's right in a way .. we've had a ton of trouble from changes and wanted to keep backward compatibility (mmap etc) But .. currently llama.cpp is static, it has one type of loader, one type of evaluation loop so it can not adapt easily to old binaries if the code advances. Sometimes it is a minor change that improves performance by 5-10%, keeping backward compatibility would mean to keep a secondary/tertiary set of functions/code available for those older binaries.

I believe one day this project, or a close fork of it, will do that. A plugin type of loaders and evaluation code that adapts on the model type and version number.

But .. as long as we do not have a adaptive backend the code should be kept tight and focused. So backward compaibility is going to break over and over again. But you can keep or compile the old llama.cpp binaries, they will work with the old weights until you are ready to upgrade.

That's just my personal view on this. My personal solution is to keep the f16 binaries (and original weights) on a fast disk, re-processing them and re-quantizing them just takes a few minutes.

wtarreau commented 1 year ago

I understand why changes are needed to make the project progress faster and to get rid of possible initial design mistakes. However that's exactly what tags and branches are made for. It would tremendously help users if just before changing the format, at least one tag was set to mark the latest release supporting this or that format. For example I had to regenerate vicuna-13b-q5 and didn't have enough FS space anymore, so I faced a dilemma: delete the old one and possibly have to download it again if the new version didn't work, or stay on the old version. I finally found a pre-generated version of it on hf.co so I didn't have to delete the old one yet. But these are concrete issues caused by the lack of tag.

JWNoctis commented 1 year ago

@JWNoctis that's very interesting, but could you please share your complete command line ? I'm still struggling to get vicuna to work as anything but a chat bot and it seems you managed to pass it instructions. But whatever I try constantly ends up with an empty response. I've pasted your instructions into a file loaded by -f, or using --prompt "$(cat thisfile.txt)", tried with/without --instruct, with/without --interactive or --interactive-first, and I never get any usable result. I know I'm missing something but previously I was convinced it was related to the models. Now that I'm seeing you're using the exact same as I am, I'm convinced the problem is somewhere else.

@wtarreau My full command was ./main -i --multiline-input -t <core_count> -m <model_path> --temp 0.8 --top_k 30 --repeat_penalty 1.17647 --repeat_last_n 256 --top_p 0.8 -n -1 -c 2048 --color -p '### Human:' -r '### Human:' --ignore-eos --no-penalize-nl

You can add --in-suffix '### Assistant:', otherwise it might add to your prompt as well, which might be interesting to look at, but (sometimes) undesirable. Add --in-prefix ' ' to avoid having to put a space before every prompt yourself every time. Reverse prompting with trailing space is less reliable in my experience. Do note that both --in-prefix and --in-suffix would make --multiline-input much less useful for creative purposes, as they'd be included before and after every input. I've found Vicuna 1.0 to be much more amenable to simple 0-shot prompts like this. Vicuna 13B 1.1 changed the separator from ### to </s>, which together with other changes made it much trickier to prompt, and gave it a tendency to start throwing HTML tags and never stop, at least in llama.cpp. I do not know if they had </s> as a single added special token, which might be what is happening here, as it is separated into several tokens in llama.cpp.

wtarreau commented 1 year ago

Thanks! So you mean that with 1.1 I should use -p '</s> Human:' and -r '</s> Human:' because that's hard-coded in the model ? Is it exposed thanks to --ignore-eos maybe ? I'm asking because I've never observed them in any single output.

JWNoctis commented 1 year ago

Thanks! So you mean that with 1.1 I should use -p '</s> Human:' and -r '</s> Human:' because that's hard-coded in the model ? Is it exposed thanks to --ignore-eos maybe ? I'm asking because I've never observed them in any single output.

For 1.1 I used -f <prompt_file> -r '</s>' -in-prefix 'USER: ' --in-suffix ' ASSISTANT: ' The prompt is

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hello. ASSISTANT:
Hello. How can I help you today?</s>

Most of these came from referencing how FastChat, which is from the same team that finetuned Vicuna, handled the prompts. I haven't found a way to make it work nearly as reliably as 1.0 did, unfortunately.

wtarreau commented 1 year ago

Interesting, thanks! I'm personally proceeding a different way by naming the participants, like this, which is probably why I never had to know the prompt names:

Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr's requests immediately and with details and precision.

In such a case I'm using -r 'Usr:' and for me it gives amazingly good results. Example:

./main --ctx_size 2048 --temp 0.36 --top_k 12 --top_p 1 --repeat_last_n 256 --batch_size 16384 --repeat_penalty 1.05 --model ../models/ggml-vic13b-q5_1.bin --threads $(nproc) --n_predict -1 --color --interactive --reverse-prompt Usr: --prompt 'Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr'"'"'s requests immediately and with details and precision.'
...
Usr:which of the following items is an exception to the rest of the list ? 
metal 
wood 
glass 
water 

Bot:Water is an exception to the rest of the list in that it is a substance that can exist in three states of matter (solid, liquid, gas) while the other items on the list are typically only found in one state of matter.
Usr:

However I can't reach anything usable using '' nor '###' nor anything like this (with or without --ignore-eos). That's why I end up using it exclusively in chat mode with "usr" and "bot" discussing. I had hoped to be able to directly ask it to complete text without faking a conversation. But that's not that big of a problem :-)

raymerjacque commented 1 year ago

My problem is not that you guys broke backward compatibility... but that you gave NO WARNING.... if something like that happens you should slap a BIG FAT warning on there before users update. that's the reason i was angry...

anyway, thanx to the screw up with breaking models, it taught me not to be so dependent on this project, so i have shifted my focus onto making my own scripts. however, i will be keeping an eye on this project.

you can see my work in detail here ( linux version ) : https://youtu.be/71DSwIk3u6Y

and the Windows version here : https://youtu.be/JTz9Hh_0PTE

Android version here : https://www.youtube.com/shorts/UqE-0IHjQI4

wtarreau commented 1 year ago

So basically you're shouting at those who give you all the work for free when what you're just left to do is a bunch of scripts. I think that an alternate approach consisting in asking if in the future it would be possible to add a tag before each breaking change would have been much better received. The amount of work we all have to do to integrate llama into our respective projects is nothing compared to all that was done, so I think that a little bit of humility and respect would be deserved. I too was a little bit annoyed by the format change though I saw it mentioned in the various PRs because I'm following the project, and google gave me a URL with an updated one and 10 minutes later it was downloaded, up and running. I accepted that it was for the good cause. If you don't want to risk this, just refrain from updating and keep it frozen on your side, but please, do not complain.

raymerjacque commented 1 year ago

So basically you're shouting at those who give you all the work for free when what you're just left to do is a bunch of scripts. I think that an alternate approach consisting in asking if in the future it would be possible to add a tag before each breaking change would have been much better received. The amount of work we all have to do to integrate llama into our respective projects is nothing compared to all that was done, so I think that a little bit of humility and respect would be deserved. I too was a little bit annoyed by the format change though I saw it mentioned in the various PRs because I'm following the project, and google gave me a URL with an updated one and 10 minutes later it was downloaded, up and running. I accepted that it was for the good cause. If you don't want to risk this, just refrain from updating and keep it frozen on your side, but please, do not complain.

dont give me that crap, I give to the community just as much as everyone else... in a few weeks ill release a voice capable AI linux distro FREE for everyone to use, thousands, millions... I've been contributing to the open source community for many many many years. right now you can download my android AI app in the google playstore and use it FREE, no ads, no registration, no fees... and all at my expense and costs.

We all use whatever is available and we all pay our dues in giving back. everything works like a chain... you break one link and it messes up someone elses day. I guess i may have gone a little far with my rant in the first post, but that update really really messed up my week... and it could easily have been avoided had there been a clear notice about such a big change.

wtarreau commented 1 year ago

OK then you understand very well how they came to that decision. For now they're not trying to maintain that big a backwards compatibility but to be compatible with various models and quantization levels. This does require breaking changes. The project doesn't advertise any form of stability at the moment, there's not even a single non-dev branch. Like any such projects you know you're walking on eggs when using them and that you have to keep your own working tags/branches/binaries before pulling. It's not in a form that's suitable for being used as-is beyond testing and that's fine. The feedback, ideas and optimizations from the community are more valuable at this point than the stability. Would it be my project, I would do my best to maintain compatibility and would create branches when it breaks. But I perfectly understand the current choices and your attitude is not welcome, really. In addition what prevents you from uploading your own fork at the last working commit and referencing it ? Hmmm ?

raymerjacque commented 1 year ago

You can still put a CLEAR notice on a commit when you know its going to break with virtually half the models on hugging face. this was no small change... no matter how you look at this. it was poorly implemented. it literally took people to come in here to complain and them to go and trace back commits to even realize what they had done...

Josh-XT commented 1 year ago

You can still put a CLEAR notice on a commit when you know its going to break with virtually half the models on hugging face. this was no small change... no matter how you look at this. it was poorly implemented. it literally took people to come in here to complain and them to go and trace back commits to even realize what they had done...

I'd recommend lowering your expectations for open source support/documentation. $Free comes with no promise of any of that.

wtarreau commented 1 year ago

@raymerjacque ah OK you just mean like this: https://github.com/ggerganov/llama.cpp/commit/56551bc11f46b2716fdf61bb48ac28414889dc0a

That's where I saw it, in the commit log and it appeared in the project's readme. Sure it didn't last very long, but that was probably deemed sufficient for all those following the project closely. I'm sorry about what you say "people to come in here to complain" but for now you are the only one. Maybe you were very frustrated or it wasted you some time at a critical moment but again you are the one who knows what you're doing with the code and how you should keep track of working versions. Even if the format didn't change, the code could have introduced a big bug making it not work anymore for you. They're not going to notify you after every single risky commit gets merge. So now please stop this, you're just fooling yourself and your arguments don't stand given the project's situation.

jeffwadsworth commented 1 year ago

So basically you're shouting at those who give you all the work for free when what you're just left to do is a bunch of scripts. I think that an alternate approach consisting in asking if in the future it would be possible to add a tag before each breaking change would have been much better received. The amount of work we all have to do to integrate llama into our respective projects is nothing compared to all that was done, so I think that a little bit of humility and respect would be deserved. I too was a little bit annoyed by the format change though I saw it mentioned in the various PRs because I'm following the project, and google gave me a URL with an updated one and 10 minutes later it was downloaded, up and running. I accepted that it was for the good cause. If you don't want to risk this, just refrain from updating and keep it frozen on your side, but please, do not complain.

dont give me that crap, I give to the community just as much as everyone else... in a few weeks ill release a voice capable AI linux distro FREE for everyone to use, thousands, millions... I've been contributing to the open source community for many many many years. right now you can download my android AI app in the google playstore and use it FREE, no ads, no registration, no fees... and all at my expense and costs.

We all use whatever is available and we all pay our dues in giving back. everything works like a chain... you break one link and it messes up someone elses day. I guess i may have gone a little far with my rant in the first post, but that update really really messed up my week... and it could easily have been avoided had there been a clear notice about such a big change.

Wait a minute...why are you just releasing the app for the Android environment? Why aren't you spending the time and resources to support IOS as well? Unbelievable.

LostRuins commented 1 year ago

@cmp-nct i am literally already doing it.

raymerjacque commented 1 year ago

So basically you're shouting at those who give you all the work for free when what you're just left to do is a bunch of scripts. I think that an alternate approach consisting in asking if in the future it would be possible to add a tag before each breaking change would have been much better received. The amount of work we all have to do to integrate llama into our respective projects is nothing compared to all that was done, so I think that a little bit of humility and respect would be deserved. I too was a little bit annoyed by the format change though I saw it mentioned in the various PRs because I'm following the project, and google gave me a URL with an updated one and 10 minutes later it was downloaded, up and running. I accepted that it was for the good cause. If you don't want to risk this, just refrain from updating and keep it frozen on your side, but please, do not complain.

dont give me that crap, I give to the community just as much as everyone else... in a few weeks ill release a voice capable AI linux distro FREE for everyone to use, thousands, millions... I've been contributing to the open source community for many many many years. right now you can download my android AI app in the google playstore and use it FREE, no ads, no registration, no fees... and all at my expense and costs. We all use whatever is available and we all pay our dues in giving back. everything works like a chain... you break one link and it messes up someone elses day. I guess i may have gone a little far with my rant in the first post, but that update really really messed up my week... and it could easily have been avoided had there been a clear notice about such a big change.

Wait a minute...why are you just releasing the app for the Android environment? Why aren't you spending the time and resources to support IOS as well? Unbelievable.

because they charge 100usd a year to add something to the apple store... f@ck apple, they have always been greedy... if you can afford their over bloated prices and products then you clearly dont have a brain and don't need free stuff, you can go pay for chatgpt plus ...

Blucknote commented 1 year ago

give me a paid version of anything and id be much happier

because they charge 100usd they have always been greedy

🤔

raymerjacque commented 1 year ago

give me a paid version of anything and id be much happier

because they charge 100usd they have always been greedy

🤔

difference between a paid version and 100 usd a year to upload a free app, dont you think ? or are you one of those apple users with no brain ?

wtarreau commented 1 year ago

I think you're too much in the "I know everything and all other ones are both wrong and stupid" mood and this is not contributing any value here. Better stop here or have an admin close this non-sensival issue.

raymerjacque commented 1 year ago

I think you're too much in the "I know everything and all other ones are both wrong and stupid" mood and this is not contributing any value here. Better stop here or have an admin close this non-sensival issue.

can i remind you, you guys keep throwing insults my way. i'm just returning the favor ...