abdeladim-s / pyllamacpp

Python bindings for llama.cpp
https://abdeladim-s.github.io/pyllamacpp/
MIT License
62 stars 21 forks source link

ggllm branch #24

Open kdsull opened 1 year ago

kdsull commented 1 year ago

Hi Abdeladim, many thanks for this new branch which I didn't expect it done this quick ! I tried on 3 platforms, ie. OSX Mojave, WSL2( ubuntu) and Ubuntu 22.04, but can't make it work... First pip/git install did not work on all three. So I downloaded the project and installed 'python setup.py install', but again all three failed with the same error messages. I attach the error messages for your ref. It obviously is above my understanding, as you've guessed ! I'd appreciate if you could have a look and advise how to make this work on my environment. Cheers !

pyllamacpp_ggllm_errors.txt

abdeladim-s commented 1 year ago

Hi Kidon,

I guess you didn't clone the ggllm submodule. Have you cloned the repo with --recursive ? Here are the detailed steps :

  1. Create and activate a virtual environment ( Create a new one to be on the safe side! ).
  2. git clone -b ggllm.cpp --recursive https://github.com/abdeladim-s/pyllamacpp
  3. cd pyllamacpp
  4. pip install -e .

Let me know if you find any other issue!

kdsull commented 1 year ago

Many thanks Abdeladim. Not so easy as it seemed ! I attach another error msg. On my PC, I do have GPU(1050Ti) but with only 4GB vram, so I'd have to use CPU only, not cublas.

ggllm-errors.txt

abdeladim-s commented 1 year ago

It is OK Kidon, I only use CPU as well, you can turn off cublas here.

Or .. to make it easy for you, if you are using a Linux x86_64, I have built whl from my machine (I think it will work on your Ubuntu and WSL at least) I have uploaded it here.

You can install it by running

pip install https://github.com/abdeladim-s/pyllamacpp/releases/download/v2.4.1/pyllamacpp-ggllm-2.4.1-cp310-cp310-linux_x86_64.whl

Let me know if you find any other issues ?

kdsull commented 1 year ago

haha, now it can't NOT work, right ?! I tried two Falcon models (5bit, 8bit) and they both work fine. Many thanks for your time, and I hope somebody else than me also find this implementation (again very easy, clean, robust one) useful. Just a few prompts tried, but looks like WizardLM models' answers suit me the best. Falcon seems decent enough, but will have to see whether it deserves its fame. Thanks and cheers !! ps. Unlike the original , it leaves '<|endoftext|>' token at the end of AI's reply before 'You:' prompt.

abdeladim-s commented 1 year ago

you are welcome :) yeah, I should've build the wheel from the beginning Hahh but glad it worked finally. yes WizardLM gives good results on my tests as well, I think Falcon gain its fame from its Apache license, or maybe the big 40B model is good (I haven't tried it). Anyways feel free to use whatever suits your needs.

regarding the PS, yes I noticed it leaves '<|endoftext|>' if you are using the CLI, the implementation was quick so I wanted just to make it work. The CLI is just for simple testing, use the generate function to filter out any token you don't want or build your own chat.

Cheers @kdsull!