chenqiny commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Load the model

Current Behavior

docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

python3 -m pip install -e . Obtaining file:///app Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Installing backend dependencies ... done Preparing editable metadata (pyproject.toml) ... done Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (4.8.0) Requirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (1.26.0) Requirement already satisfied: diskcache>=5.6.1 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (5.6.3) Building wheels for collected packages: llama_cpp_python Building editable for llama_cpp_python (pyproject.toml) ... done Created wheel for llama_cpp_python: filename=llama_cpp_python-0.2.7-cp311-cp311-manylinux_2_31_x86_64.whl size=911317 sha256=b77877c90bdba00e257432c49978a075519f5818f17e14ecc00db21c1fd6998c Stored in directory: /tmp/pip-ephem-wheel-cache-ivqpfggy/wheels/57/0f/98/bb57b2b57b95807699b822a35c022f139d38a02c27922f27ce Successfully built llama_cpp_python Installing collected packages: llama_cpp_python Attempting uninstall: llama_cpp_python Found existing installation: llama_cpp_python 0.2.7 Uninstalling llama_cpp_python-0.2.7: Successfully uninstalled llama_cpp_python-0.2.7 Successfully installed llama_cpp_python-0.2.7 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Illegal instruction (core dumped)

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

aiu-test:/data/gguf # lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2440 0 @ 2.40GHz
    CPU family:          6
    Model:               45
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           2
    Stepping:            7
    CPU max MHz:         2900.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            4799.98
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
                         cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpri
                         ority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    3 MiB (12 instances)
  L3:                    30 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-5,12-17
  NUMA node1 CPU(s):     6-11,18-23
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating System, e.g. for Linux:

$ uname -a

Linux aiu-test 5.14.21-150400.24.63-default #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f) x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version
$ make --version
$ g++ --version

Python 3.11.5 (main, Sep 20 2023, 11:03:59) [GCC 10.2.1 20210110] on linux

Failure Information (for bugs)

Illegal instruction (core dumped)

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

Note: Many issues seem to be regarding functional or performance issues / differences with llama.cpp. In these cases we need to confirm that you're comparing against the version of llama.cpp that was built with your python package, and which parameters you're passing to the context.

Try the following:

git clone https://github.com/abetlen/llama-cpp-python
cd llama-cpp-python
rm -rf _skbuild/ # delete any old builds
python setup.py develop
cd ./vendor/llama.cpp
Follow llama.cpp's instructions to cmake llama.cpp
Run llama.cpp's ./main with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, log an issue with llama.cpp

I tried it, then I got

root@51b054c89440:/work/llama-cpp-python/vendor/llama.cpp/build/bin# ./main Log start main: warning: changing RoPE frequency base to 0 (default 10000.0) main: warning: scaling RoPE frequency by 0 (default 1.0) main: build = 1271 (a98b163) main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu main: seed = 1695717956 Illegal instruction (core dumped)

Failure Logs

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.


/work/llama-cpp-python/vendor/llama.cpp/build/bin# git log | head -1
commit a98b1633d5a94d0aa84c7c16e1f8df5ac21fc850

chenqiny commented 1 year ago

I opened an issue to llama.cpp. If it it built by cmake, then I will get same issue.

https://github.com/ggerganov/llama.cpp/issues/3339

chenqiny commented 1 year ago

I used work around.

1.download llamacpp code

2. make

3. make libllama.so

4. overwrite libllama.so in llama-cpp-python

Saivignesh-05 commented 1 year ago

thanks @chenqiny. I also had the issue of illegal instruction. Your solution works!

abetlen / llama-cpp-python

Crash on x86 with llama-cpp-python with docker or on host directly #753

Prerequisites

Expected Behavior

Current Behavior

docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs