cmp-nct / ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support
Other
245 stars 21 forks source link

Illegal instruction (core dumped) on Ubuntu 22.04.2 #11

Open dRAT3 opened 1 year ago

dRAT3 commented 1 year ago

It core dumps when running without args and when running with the correct args as well.

System: Kernel Linux 5.15.0-75-generic x86_64 MATE 1.26.0 Intel® Xeon(R) CPU X5660 @ 2.80GHz × 24 llvmpipe (LLVM 15.0.6, 128 bits)

Branch: Main no cuda _home_barry_evo_ggllm.cpp_build_bin_falcon_main.1000.txt

dRAT3 commented 1 year ago

I do have 2 cpus maybe that has something to do with it?

cmp-nct commented 1 year ago

I can't really reproduce it, I tried pure CPU on linux and windows with the same model and commandline. Appears to work fine

Can you try run in debug mode, to see if you get more out of it. If you know how to use gdb a backtrace would be useful

dRAT3 commented 1 year ago
Starting program: /home/barry/evo/ggllm.cpp-1/build/bin/falcon_main -t 11 -m /home/barry/evo/falcon/wizard-falcon40b.ggmlv3.q4_K_M.bin -p This\ is\ your\ first\ run\ here\ what\ do\ you\ wanna\ say\? -n 512
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main (argc=9, argv=0x7fffffffdf58) at /home/barry/evo/ggllm.cpp-1/examples/falcon/falcon_main.cpp:51
51  int main(int argc, char ** argv) {
(gdb) s
52      gpt_params params;
(gdb) s
gpt_params::gpt_params (this=0x7fffffffc880) at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.h:23
23  struct gpt_params {
(gdb) s
get_num_physical_cores () at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.cpp:31
31  int32_t get_num_physical_cores() {
(gdb) s 30
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::replace (__n2=23, __s=0x5555555c8fd7 "/sys/devices/system/cpu", __n1=0, __pos=0, this=0x7fffffffc2f0) at /usr/include/c++/11/bits/basic_string.h:1956
1956          replace(size_type __pos, size_type __n1, const _CharT* __s,
(gdb) s 30
0x000055555556714e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider::_Alloc_hider (__a=..., __dat=0x7fffffffc340 "", this=0x7fffffffc330) at /usr/include/c++/11/bits/basic_string.h:168
168     : allocator_type(std::move(__a)), _M_p(__dat) { }
(gdb) s 30
std::basic_ifstream<char, std::char_traits<char> >::basic_ifstream (__mode=std::_S_in, __s="/sys/devices/system/cpu0/topology/thread_siblings", this=0x7fffffffc350, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at /usr/include/c++/11/fstream:569
569       : __istream_type(), _M_filebuf()
(gdb) s 30
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local (this=0x7fffffffc2f0) at /usr/include/c++/11/bits/basic_string.h:230
230       { return _M_data() == _M_local_data(); }
(gdb) s 30
462       _M_begin() const
(gdb) s 10
132 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) s
133 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
__memset_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:368
368 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
369 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
370 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
402 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
403 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
__memset_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:404
404 in ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(gdb) s
std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::~_Hashtable (this=0x7fffffffc2b0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/hashtable.h:1532
1532          clear();
(gdb) s
0x000055555556bb7e in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::clear (
    this=0x7fffffffc2b0) at /usr/include/c++/11/bits/hashtable.h:2323
2323          _M_element_count = 0;
(gdb) s
std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::~_Hashtable (this=0x7fffffffc2b0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/hashtable.h:1533
1533          _M_deallocate_buckets();
(gdb) s
std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_deallocate_buckets (this=0x7fffffffc2b0)
    at /usr/include/c++/11/bits/hashtable.h:454
454       { _M_deallocate_buckets(_M_buckets, _M_bucket_count); }
(gdb) s
std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_deallocate_buckets (__bkt_count=1, 
    __bkts=0x7fffffffc2e0, this=0x7fffffffc2b0) at /usr/include/c++/11/bits/hashtable.h:421
421       _M_uses_single_bucket(__buckets_ptr __bkts) const
(gdb) s
gpt_params::gpt_params (this=0x7fffffffc880) at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.h:23
23  struct gpt_params {
(gdb) s
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (__a=..., __s=<optimized out>, 
    this=<optimized out>) at /usr/include/c++/11/bits/basic_string.h:539
539     _M_construct(__s, __end, random_access_iterator_tag());
(gdb) s
0x0000555555563570 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*> (
    __end=<optimized out>, __beg=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/basic_string.tcc:219
219         _M_data(_M_create(__dnew, size_type(0)));
(gdb) s

Program received signal SIGILL, Illegal instruction.
0x0000555555563573 in gpt_params::gpt_params (this=0x7fffffffc880) at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.h:23
23  struct gpt_params {
(gdb) s

Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
cebtenzzre commented 1 year ago

@dRAT3 Instead of single-stepping you can just continue so it stops at the SIGILL, and then use bt and disas to show the call stack and disassembly at the point where it crashed.

dRAT3 commented 1 year ago

Oh thx mate first time using gdb :+1:

Program received signal SIGILL, Illegal instruction.
0x0000555555563573 in gpt_params::gpt_params (this=0x7fffffffc880) at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.h:23
23  struct gpt_params {
(gdb) bt
#0  0x0000555555563573 in gpt_params::gpt_params (this=0x7fffffffc880) at /home/barry/evo/ggllm.cpp-1/examples/falcon_common.h:23
#1  0x0000555555560767 in main (argc=9, argv=0x7fffffffdf58) at /home/barry/evo/ggllm.cpp-1/examples/falcon/falcon_main.cpp:52
(gdb) disas
Dump of assembler code for function _ZN10gpt_paramsC2Ev:
   0x0000555555563540 <+0>: endbr64 
   0x0000555555563544 <+4>: push   %rbp
   0x0000555555563545 <+5>: push   %rbx
   0x0000555555563546 <+6>: mov    %rdi,%rbx
   0x0000555555563549 <+9>: sub    $0x18,%rsp
   0x000055555556354d <+13>:    mov    %fs:0x28,%rax
   0x0000555555563556 <+22>:    mov    %rax,0x8(%rsp)
   0x000055555556355b <+27>:    xor    %eax,%eax
   0x000055555556355d <+29>:    movl   $0xffffffff,(%rdi)
   0x0000555555563563 <+35>:    call   0x555555566f40 <_Z22get_num_physical_coresv>
   0x0000555555563568 <+40>:    movq   $0x200,0x10(%rbx)
   0x0000555555563570 <+48>:    mov    %rsp,%rsi
=> 0x0000555555563573 <+51>:    vpxor  %xmm0,%xmm0,%xmm0
   0x0000555555563577 <+55>:    mov    %eax,0x4(%rbx)
   0x000055555556357a <+58>:    lea    0xc8(%rbx),%rdi
   0x0000555555563581 <+65>:    xor    %edx,%edx
   0x0000555555563583 <+67>:    movabs $0x200ffffffff,%rax
   0x000055555556358d <+77>:    mov    %rax,0x8(%rbx)
   0x0000555555563591 <+81>:    lea    0x90(%rbx),%rax
   0x0000555555563598 <+88>:    mov    %rax,0x60(%rbx)
   0x000055555556359c <+92>:    movabs $0x3f73333300000028,%rax
   0x00005555555635a6 <+102>:   mov    %rax,0x98(%rbx)
   0x00005555555635ad <+109>:   movabs $0x3f8000003f800000,%rax
   0x00005555555635b7 <+119>:   mov    %rax,0xa0(%rbx)
   0x00005555555635be <+126>:   movabs $0x3f8ccccd3f4ccccd,%rax
   0x00005555555635c8 <+136>:   mov    %rax,0xa8(%rbx)
   0x00005555555635cf <+143>:   movabs $0x3dcccccd40a00000,%rax
   0x00005555555635d9 <+153>:   mov    %rax,0xc0(%rbx)
   0x00005555555635e0 <+160>:   lea    0xd8(%rbx),%rax
   0x00005555555635e7 <+167>:   movq   $0x0,0x18(%rbx)
   0x00005555555635ef <+175>:   movq   $0x1,0x68(%rbx)
   0x00005555555635f7 <+183>:   movq   $0x0,0x70(%rbx)
   0x00005555555635ff <+191>:   movq   $0x0,0x78(%rbx)
dRAT3 commented 1 year ago

going to spray some prints in the get num_physical_cores see at what line it breaks exactly

cmp-nct commented 1 year ago

Interesting, my first guess is that it's a kernel issue. some AVX support problem. Quick fix: remove the get_physical_cores() call from falcon_common.h line 23 and replace it with 1. You can always set the threads using "-t" Other option for linux to get the cores would be: cores = sysconf(_SC_NPROCESSORS_CONF); // will also need unistd.h

Other option: kernel upgrade. You can also try updating compilers to latest standards first.

If you run into more such issues beyond that point the compilation flags will need to be changed to not use those instruction sets. Normally that should work out of the box..

cebtenzzre commented 1 year ago

It looks like the compiler is being told to compile with AVX instructions even though your CPU does not support them. If you can pass -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF to CMake I think it would work around this issue.

dRAT3 commented 1 year ago

Built like this: rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF .. && cmake --build . --config Release Still core dumps on illegal instruction

cebtenzzre commented 1 year ago

I'm not sure what instructions your CPU supports, but you may also need one of -DLLAMA_FMA=OFF or -DLLAMA_F16C=OFF (or both). The output of cpuid from the cpuid package would be helpful.

dRAT3 commented 1 year ago

cpuid.txt

cebtenzzre commented 1 year ago

Yep, so you need at least -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_F16C=OFF in order to compile code that is compatible with your CPU.

cmp-nct commented 1 year ago

I was under the impression that AVX is automatically switched on/off depending on what the cpu broadcasts as supported. Good to know that needs to be set manually