abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.65k stars 920 forks source link

crash on macos with SIGABRT #342

Closed siddhsql closed 1 year ago

siddhsql commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Observed Behavior

I am running v0.1.57 of the program with model weights from https://huggingface.co/TheBloke/vicuna-7B-1.1-GGML on an intel based MacOS with 16GB RAM and 6GB of used memory. I have been able to install the application but when I try to run it I get this:

Python 3.10.2 (main, Feb  2 2022, 08:42:42) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/vicuna-7b-1.1.ggmlv3.q5_1.bin")
llama.cpp: loading model from ./models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB
[1]    53197 abort      python3

The Python interpreter just crashes with a SIGABRT. there is no traceback printed on the screen.

Expected Behavior

no error

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

I am running on Mac OS Ventura with intel based 6 core CPU and 16GB of RAM. 6GB is used.

$ uname -a

% uname -a
Darwin  22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 x86_64
$ python3 --version
$ make --version
$ g++ --version
% python3 --version
Python 3.10.2

% make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0
% g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

https://gist.github.com/siddhsql/ea1d8b0289896a7a0748504f6802c8ac

Steps to Reproduce

see above.

Try the following:

  1. git clone https://github.com/abetlen/llama-cpp-python
  2. cd llama-cpp-python
  3. rm -rf _skbuild/ # delete any old builds
  4. python setup.py develop
  5. cd ./vendor/llama.cpp
  6. Follow llama.cpp's instructions to cmake llama.cpp
  7. Run llama.cpp's ./main with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, log an issue with llama.cpp

I did try this and it works. please see below:

% ./bin/main -m ~/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin -p "Building a website can be done in 10 simple steps:" -n 512
main: build = 634 (5b57a5b)
main: seed  = 1686176705
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0

 Building a website can be done in 10 simple steps:
1. Decide on the purpose of your website and what you want to achieve with it. This will help guide the design, content, and functionality of your site.
2. Choose a domain name that is easy to remember and reflects the purpose of your website. It should be unique and preferably short, as people are more likely to remember shorter names.
3. Choose a web hosting service that provides enough space and bandwidth for your needs. Some popular options include Bluehost, HostGator, and SiteGround.
4. Select a website builder or content management system (CMS) such as WordPress, Wix, Squarespace, or Shopify to create your site. These platforms make it easy to customize the design and layout of your site without needing any coding experience.
5. Choose a theme or template that matches the purpose and style of your website. This will give your site a professional look and feel, which is important for building credibility with visitors.
6. Create high-quality content that is relevant to your target audience and provides value to them. Keep in mind that search engines rank websites based on the quality and relevance of their content, so it’s essential to focus on creating useful and informative articles, blog posts, videos, and other types of media.
7. Optimize your website for search engines by including relevant keywords and phrases throughout your content, meta tags, and headings. This will help ensure that your site appears in the top results when people search for related topics.
8. Use social media to promote your website and engage with your audience. Share your content on platforms like Facebook, Twitter, LinkedIn, Instagram, and Pinterest to increase visibility and drive traffic to your site.
9. Analyze your website’s performance using tools such as Google Analytics or Jetpack by WordPress. This will help you track key metrics such as page views, bounce rates, and conversion rates so that you can make data-driven decisions about how to improve your site over time.
10. Continuously update and improve your website with fresh content, new features, and ongoing optimization efforts. This will help keep visitors engaged and coming back for more, which is essential for building a loyal following and achieving your goals. [end of text]

llama_print_timings:        load time = 10298.92 ms
llama_print_timings:      sample time =   371.18 ms /   488 runs   (    0.76 ms per token)
llama_print_timings: prompt eval time = 10278.37 ms /    14 tokens (  734.17 ms per token)
llama_print_timings:        eval time = 104839.56 ms /   487 runs   (  215.28 ms per token)
llama_print_timings:       total time = 115566.46 ms

Failure Logs

https://gist.github.com/siddhsql/ea1d8b0289896a7a0748504f6802c8ac

siddhsql commented 1 year ago

if it helps, I stepped through the code in a debugger and it runs into a problem here:

def llama_init_from_file(
    path_model: bytes, params: llama_context_params
) -> llama_context_p:
    return _lib.llama_init_from_file(path_model, params)

upon returning from this function, the following assertion fails in llama.py:

assert self.ctx is not None

the assertion fails under debugger but when running from command line the assertions are disabled and so the program continues and crashes later on.

siddhsql commented 1 year ago

adding some more notes for self: we do see this line in the output of llama-cpp-python: https://github.com/ggerganov/llama.cpp/blob/5c64a0952ee58b2d742ee84e8e3d43cce5d366db/llama.cpp#L2501

            fprintf(stderr, "%s: kv self size  = %7.2f MB\n", __func__, memory_size / 1024.0 / 1024.0);

so then the only code left is what comes afterwards... and it does work when running llama.cpp directly

siddhsql commented 1 year ago

it beats me. I added this code just before returning ctx from llama.cpp:

fprintf(stderr, "debug: returning ctx\n");
    return ctx;

and I see this in the output console:

llama_init_from_file: kv self size  =  256.00 MB
debug: returning ctx

and yet self.ctx is None in:

assert self.ctx is not None
image

how is this even possible?

siddhsql commented 1 year ago

running docker image errors out as well:

% docker run --rm -it -p 8000:8000 -v $PWD/models:/models -e MODEL=/models/$MODEL_NAME ghcr.io/abetlen/llama-cpp-python:latest
llama.cpp: loading model from /models/vicuna-7b-1.1.ggmlv3.q5_1.bin
Illegal instruction

am i the only one with this problem? can't be.

gjmulder commented 1 year ago

Illegal instruction usually indicates that a binary code is compiled for the wrong architecture, i.e. this is a compiler configuration issue.

siddhsql commented 1 year ago

correct. it seems the docker image has precompiled binary of llama.cpp for different architecture. also see: https://github.com/ggerganov/llama.cpp/issues/537 anyway the problem persists. i was trying to use docker as an alternative to see if that works. can someone please help me here? i can't believe i am the only one having this problem.

On Thu, Jun 8, 2023 at 8:17 AM Gary Mulder @.***> wrote:

Illegal instruction usually indicates that a binary code is compiled for the wrong architecture, i.e. this is a compiler configuration issue.

— Reply to this email directly, view it on GitHub https://github.com/abetlen/llama-cpp-python/issues/342#issuecomment-1582793164, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEK7NMBTCN6ADYPT6ZZ3XKHUG5ANCNFSM6AAAAAAY6QAGOQ . You are receiving this because you authored the thread.Message ID: @.***>

siddhsql commented 1 year ago

adding some more info to help anyone who runs into this problem: i tried pyllamacpp and it works (nevermind the garbage output):

% pyllamacpp ~/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin

██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.4.1

=========================================================================================

[+] Running model `/Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 2052.00 MB per state)
.
llama_init_from_file: kv self size  =  512.00 MB
...
[+] Press Ctrl+C to Stop ...
...
You: who is the president of usa?
AI: class MQL5_CALLBACK_TYPE
{
public:
    virtual bool OnTick() = 0;
};

class MQL5_EVENT_TYPE
{
public:
    virtual void Process() = 0;
};
```scss
class MQL5_ON_TRADING_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

class MQL5_ON_ORDER_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

class MQL5_ON_TAKE_PROFIT_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

class MQL5_ON_STOP_LOSS_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

class MQL5_ON_HEDGE_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

class MQL5_ON_TRADE_END_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
    void Process();
};

//...

// Register events MQL5_ON_TRADING_EVENT_REGISTER(class MQL5_ON_TRADING_EVENT_TYPE); MQL5_ON_ORDER_EVENT_REGISTER(class MQL5_ON_ORDER_EVENT_TYPE); MQL5_ON_TAKE_PROFIT_EVENT_REGISTER(class MQL5_ON_TAKE_PROFIT_EVENT_TYPE); MQL5_ON_STOP_LOSS_EVENT_REGISTER(class MQL5_ON_STOP_LOSS_EVENT_TYPE); MQL5_ON_HEDGE_EVENT_REGISTER(class MQL5_ON_HEDGE_EVENT_TYPE);QL5_ON_TRADE_END_EVENT_REGISTER(class MQL5_ON_TRADE_END_EVENT_TYPE);//... You:

gjmulder commented 1 year ago

Closing please reopen if the problem is reproducible with the latest llama-cpp-python which includes an updated llama.cpp