chat with bob example broken

nuance1979 commented 1 year ago

Hi @abdeladim-s , thanks for the update!

I was trying to update to pyllamacpp==2.4.0 but found that even the example on the README, which is similar to llama.cpp's ./examples/chat.sh but not identical, is not working properly. For example, when I copied the example code into a foo.py and run it, I got:

If I go to llama.cpp, check out 66874d4 then make clean && make && ./examples/chat.sh, I got:

I just want to get an equivalent of running llama.cpp's chat.sh with pyllamacpp==2.4.0, no more no less. How should I do it?

abdeladim-s commented 1 year ago

Hi @nuance1979, You are welcome & thanks for reporting the bug.

Could you please let me know what model are you using so I can debug using the same model ?

abdeladim-s commented 1 year ago

@nuance1979 Oh, I have just noticed! it was just the print out of scope :sweat_smile: please give it a try now and let me know if the problem persists ?

nuance1979 commented 1 year ago

@abdeladim-s Yes, the format is fixed. But I was mainly talking about the content that's not right:

You see the ./examples/chat.sh from llama.cpp gives me the sensible answer about who Barack Obama is but your code snippet gives me non-sensical answer. Something is wrong.

abdeladim-s commented 1 year ago

@nuance1979, I think you just need to specify the exact parameters as that chat.sh example. I have updated the example in the readme to match it.

Those are my results : chat_with_bob

Please update from source and give it a try ?

nuance1979 commented 1 year ago

Thanks a lot! I don't know why but I'm still getting non-sensical results like this after I install from master branch and use the updated example:

nuance1979 commented 1 year ago

Another try, still non-sensical:

I see a difference in llama_init_from_file: kv self size = 512.00 MB while when I run ./examples/chat.sh it was 256 MB. Not sure if that makes a difference. But when I run ./examples/chat.sh, I can always get answers that makes sense.

abdeladim-s commented 1 year ago

@nuance1979, that's weird, the model seems to be always hallucinating! on my end everything's working as expected (as you can see on my previous comment).

yeah you are right, I really don't know why it is devised by half, usually it equals to the context size! but I don't think this hallucination problem has nothing to do with the kv cache size. You should at least get meaningful results!

have you tried other models? Can you try the pyllamacpp cli as well ?

nuance1979 commented 1 year ago

I tried pyllamacpp cli and still got non-sensical output:

I checked the SHA256SUM of my .pth and f16.bin files and they matched completely. Again, the same model gave me completely sensible answer when invoked with ./examples/chat.sh so the logical conclusion is that something within pyllamacpp is not right. You can try check your model's SHA256SUM against this file: https://github.com/ggerganov/llama.cpp/blob/master/SHA256SUMS

abdeladim-s commented 1 year ago

Yeah something is happening, but I honestly have no idea since I couldn't reproduce this issue on my end. Do you have any idea how to proceed ?

nuance1979 commented 1 year ago

Can you ask a third person to try it? Just to see whether it's a problem on my side.

abdeladim-s commented 1 year ago

Sure .. let us try that, @ParisNeo is using pyllamacpp as a backend to his UI, and his repo has so many stars already.

-- Hi @ParisNeo,

Could you please let us know if someone on your repo has reported a similar problem to this issue ? And ,if you have some time, could you please try the example in the readme page whether it's working properly on your side ?

Thank you!

ParisNeo commented 1 year ago

Hi, No, I didn't have any complaints about the pyllamacpp backend yet. If I have time tomorrow I'll try. I got to go.

abdeladim-s commented 1 year ago

Thanks @ParisNeo, Let us know if you found any issues.

abdeladim-s commented 1 year ago

hi @nuance1979, any news on this ? Are you still getting the same error ?

If you know someone else who can test it then please send them a message!

Otherwise I have tried to test it on colab as well, even though it is slow, but it worked as expected. Please give it a try, here is the notebook.

nuance1979 commented 1 year ago

hi @nuance1979, any news on this ? Are you still getting the same error ?

Yes. Still non-sensical answers.

If you know someone else who can test it then please send them a message!

Sure. I'll ask my friend to test it.

Otherwise I have tried to test it on colab as well, even though it is slow, but it worked as expected. Please give it a try, here is the notebook.

All my tests were done with the original llama 7B model (quantized into q4_0.bin with llama.cpp). But you are testing WizardLM-7B in the notebook so I don't think it's useful here.

nuance1979 commented 1 year ago

Ok. I tried your notebook with llama-7b and it can reproduce what I saw:

Again, I want to emphasize that the same model behaves correctly when I use ./examples/chat.sh from llama.cpp repo.

You can try it yourself with this model link: https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_0.bin

abdeladim-s commented 1 year ago

All my tests were done with the original llama 7B model (quantized into q4_0.bin with llama.cpp). But you are testing WizardLM-7B in the notebook so I don't think it's useful here.

Oh! are you using the original model? .. So maybe that's the source of the problem. The original LLaMA model is not fine-tuned on instruction-response, so using it in a chat manner is not really correct! But I am quite surprised it is working on llama.cpp.

I will try to test with the original model and see. But usually, you will need to try that example with fine-tuned models like wizardLM alpaca 'vicuna` etc. to get good results.

nuance1979 commented 1 year ago

I understand the difference between original llama and instruction-tuned variants. All I'm saying is that the fact that llama.cpp works under the same condition points to a potential bug in pyllamacpp and it would be great if you can fix it.

abdeladim-s commented 1 year ago

@nuance1979, Yeah you are right. Sorry for that :( I don't know what am I messing in my implementation. especially it seems to be working with other models! I need to check it again ..

Let me know if you have any ideas .. any help would be appreciated! Thanks!

siddhsql commented 1 year ago

if it helps someone, I tried it like this. edit cli.py and make following changes:

PROMPT_CONTEXT = """Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: please tell me the largest city in Europe.
Bob: Sure, The largest city in Europe is Moscow
"""
PROMPT_PREFIX = ""
PROMPT_SUFFIX = ""

example output:

python3 cli.py /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin

██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.4.1

=========================================================================================

[+] Running model `/Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 2052.00 MB per state)
.
llama_init_from_file: kv self size  =  512.00 MB
...
[+] Press Ctrl+C to Stop ...
...
You: who is barack obama?
AI:
Obama, Barack Hussein (b. 1961), first African American president of the United States (2009-17). He was born in Honolulu, Hawaii, to a Kenyan father and an American mother. After graduating from Columbia University and Harvard Law School, he worked as a community organizer in Chicago before being elected to the Illinois state senate in 1996. In 2004, he was elected to the U.S. Senate, and four years later he ran for president, defeating Republican John McCain in the general election. He won re-election in 2012. Obama's presidency was marked by efforts to address climate change, improve healthcare access, and strengthen national security through foreign policy initiatives such as the withdrawal of U.S. troops from Iraq and the pursuit of peace talks with Iran. He signed into law landmark legislation including the Affordable Care Act (ACA) and the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, and oversaw the end of U.S. combat operations in Afghanistan. Despite some controversies, Obama remains a highly regarded figure in American politics and continues to advocate for progressive causes through his organization, Organizing for Action.
You:

I am synced to commit 6d487b904b93c48862cc1d8b29c7f3466ca6f6a5

nuance1979 commented 1 year ago

Thanks @siddhsql ! However, you are using vicuna-7b and I was talking about llama-7b.

abdeladim-s / pyllamacpp

chat with bob example broken #12