Open nuance1979 opened 1 year ago
Hi @nuance1979, You are welcome & thanks for reporting the bug.
Could you please let me know what model are you using so I can debug using the same model ?
@nuance1979 Oh, I have just noticed! it was just the print
out of scope :sweat_smile:
please give it a try now and let me know if the problem persists ?
@abdeladim-s Yes, the format is fixed. But I was mainly talking about the content that's not right:
You see the ./examples/chat.sh
from llama.cpp
gives me the sensible answer about who Barack Obama is but your code snippet gives me non-sensical answer. Something is wrong.
@nuance1979, I think you just need to specify the exact parameters as that chat.sh
example.
I have updated the example in the readme to match it.
Those are my results :
Please update from source and give it a try ?
Thanks a lot! I don't know why but I'm still getting non-sensical results like this after I install from master branch and use the updated example:
Another try, still non-sensical:
I see a difference in llama_init_from_file: kv self size = 512.00 MB
while when I run ./examples/chat.sh
it was 256 MB. Not sure if that makes a difference. But when I run ./examples/chat.sh
, I can always get answers that makes sense.
@nuance1979, that's weird, the model seems to be always hallucinating! on my end everything's working as expected (as you can see on my previous comment).
yeah you are right, I really don't know why it is devised by half, usually it equals to the context size!
but I don't think this hallucination problem has nothing to do with the kv
cache size.
You should at least get meaningful results!
have you tried other models?
Can you try the pyllamacpp
cli as well ?
I tried pyllamacpp
cli and still got non-sensical output:
I checked the SHA256SUM of my .pth
and f16.bin
files and they matched completely. Again, the same model gave me completely sensible answer when invoked with ./examples/chat.sh
so the logical conclusion is that something within pyllamacpp
is not right. You can try check your model's SHA256SUM against this file: https://github.com/ggerganov/llama.cpp/blob/master/SHA256SUMS
Yeah something is happening, but I honestly have no idea since I couldn't reproduce this issue on my end. Do you have any idea how to proceed ?
Can you ask a third person to try it? Just to see whether it's a problem on my side.
Sure .. let us try that, @ParisNeo is using pyllamacpp
as a backend to his UI, and his repo has so many stars already.
-- Hi @ParisNeo,
Could you please let us know if someone on your repo has reported a similar problem to this issue ? And ,if you have some time, could you please try the example in the readme page whether it's working properly on your side ?
Thank you!
Hi, No, I didn't have any complaints about the pyllamacpp backend yet. If I have time tomorrow I'll try. I got to go.
Thanks @ParisNeo, Let us know if you found any issues.
hi @nuance1979, any news on this ? Are you still getting the same error ?
If you know someone else who can test it then please send them a message!
Otherwise I have tried to test it on colab as well, even though it is slow, but it worked as expected. Please give it a try, here is the notebook.
hi @nuance1979, any news on this ? Are you still getting the same error ?
Yes. Still non-sensical answers.
If you know someone else who can test it then please send them a message!
Sure. I'll ask my friend to test it.
Otherwise I have tried to test it on colab as well, even though it is slow, but it worked as expected. Please give it a try, here is the notebook.
All my tests were done with the original llama 7B model (quantized into q4_0.bin with llama.cpp). But you are testing WizardLM-7B
in the notebook so I don't think it's useful here.
Ok. I tried your notebook with llama-7b and it can reproduce what I saw:
Again, I want to emphasize that the same model behaves correctly when I use ./examples/chat.sh
from llama.cpp
repo.
You can try it yourself with this model link: https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_0.bin
All my tests were done with the original llama 7B model (quantized into q4_0.bin with llama.cpp). But you are testing
WizardLM-7B
in the notebook so I don't think it's useful here.
Oh! are you using the original model? .. So maybe that's the source of the problem.
The original LLaMA model is not fine-tuned on instruction-response, so using it in a chat manner is not really correct! But I am quite surprised it is working on llama.cpp
.
I will try to test with the original model and see.
But usually, you will need to try that example with fine-tuned models like wizardLM
alpaca
'vicuna` etc. to get good results.
I understand the difference between original llama and instruction-tuned variants. All I'm saying is that the fact that llama.cpp works under the same condition points to a potential bug in pyllamacpp and it would be great if you can fix it.
@nuance1979, Yeah you are right. Sorry for that :( I don't know what am I messing in my implementation. especially it seems to be working with other models! I need to check it again ..
Let me know if you have any ideas .. any help would be appreciated! Thanks!
if it helps someone, I tried it like this. edit cli.py and make following changes:
PROMPT_CONTEXT = """Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: please tell me the largest city in Europe.
Bob: Sure, The largest city in Europe is Moscow
"""
PROMPT_PREFIX = ""
PROMPT_SUFFIX = ""
example output:
python3 cli.py /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
██████╗ ██╗ ██╗██╗ ██╗ █████╗ ███╗ ███╗ █████╗ ██████╗██████╗ ██████╗
██╔══██╗╚██╗ ██╔╝██║ ██║ ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║ ██║ ███████║██╔████╔██║███████║██║ ██████╔╝██████╔╝
██╔═══╝ ╚██╔╝ ██║ ██║ ██╔══██║██║╚██╔╝██║██╔══██║██║ ██╔═══╝ ██╔═══╝
██║ ██║ ███████╗███████╗██║ ██║██║ ╚═╝ ██║██║ ██║╚██████╗██║ ██║
╚═╝ ╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝
PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.4.1
=========================================================================================
[+] Running model `/Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: mem required = 6612.59 MB (+ 2052.00 MB per state)
.
llama_init_from_file: kv self size = 512.00 MB
...
[+] Press Ctrl+C to Stop ...
...
You: who is barack obama?
AI:
Obama, Barack Hussein (b. 1961), first African American president of the United States (2009-17). He was born in Honolulu, Hawaii, to a Kenyan father and an American mother. After graduating from Columbia University and Harvard Law School, he worked as a community organizer in Chicago before being elected to the Illinois state senate in 1996. In 2004, he was elected to the U.S. Senate, and four years later he ran for president, defeating Republican John McCain in the general election. He won re-election in 2012. Obama's presidency was marked by efforts to address climate change, improve healthcare access, and strengthen national security through foreign policy initiatives such as the withdrawal of U.S. troops from Iraq and the pursuit of peace talks with Iran. He signed into law landmark legislation including the Affordable Care Act (ACA) and the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, and oversaw the end of U.S. combat operations in Afghanistan. Despite some controversies, Obama remains a highly regarded figure in American politics and continues to advocate for progressive causes through his organization, Organizing for Action.
You:
I am synced to commit 6d487b904b93c48862cc1d8b29c7f3466ca6f6a5
Thanks @siddhsql ! However, you are using vicuna-7b
and I was talking about llama-7b
.
Hi @abdeladim-s , thanks for the update!
I was trying to update to
pyllamacpp==2.4.0
but found that even the example on the README, which is similar tollama.cpp
's./examples/chat.sh
but not identical, is not working properly. For example, when I copied the example code into afoo.py
and run it, I got:If I go to
llama.cpp
, check out66874d4
thenmake clean && make && ./examples/chat.sh
, I got:I just want to get an equivalent of running
llama.cpp
'schat.sh
withpyllamacpp==2.4.0
, no more no less. How should I do it?