Docker Issus ''Illegal instruction''

Netsuno commented 1 year ago

I try to make it run the docker version on Unraid,

I run this as post Arguments: --run -m /models/7B/ggml-model-q4_0.bin -p "This is a test" -n 512

I got this error: /app/.devops/tools.sh: line 40: 7 Illegal instruction ./main $arg2

Log:

main: seed = 1679843913
llama_model_load: loading model from '/models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.34 MB
llama_model_load: mem required  = 6065.34 MB (+ 1026.00 MB per state)
/app/.devops/tools.sh: line 40:     7 Illegal instruction     ./main $arg2

I have run this whitout any issus: --all-in-one "/models/" 7B

nsarrazin commented 1 year ago

I also get Illegal instruction (core dumped) when using the docker image, while compiling from source seems to solve the issue.

This is on Pop Os 22.04 with kernel 6.2.0-76060200 on a Ryzen 5 5600X, x86_64 with avx2, with gcc11.

Some versions are fine though, we found that light-19726169b379bebc96189673a19b89ab1d307659 doesn't seem to have this problem but light-34c1072e497eb92d81ee7c0e12aa6741496a41c6 does ?

(we've been tracking this here too: https://github.com/nsarrazin/serge/pull/66)

anzz1 commented 1 year ago

Illegal instruction sounds like using a instruction which your processor does not support. I've touched on the issue in this discussion:

https://github.com/ggerganov/llama.cpp/discussions/535

slaren commented 1 year ago

It's clear that as long as CPU features are determined at compile time, distributing binaries is going to cause problems like this.

gaby commented 1 year ago

@slaren That explains why the binaries are so inconsistent. We don't know what CPU's the github runners are using, thus making the binaries un-usable

Netsuno commented 1 year ago

Machine spec: Motherbord: Supermicro X9DRi-LN4+, Version REV:1.20A CPU: Dual Xeon E5-2670 v2 Ram: 128G DDR3 LRDIMM

anzz1 commented 1 year ago

@gaby yes they can vary, but for compilation it doesn't matter which cpu the runner has, only for tests. as you can see in the discussion how the windows builds always build avx512 but only test when its possible. if the docker builder looks at its own features when compiling the binaries, then its misconfigured. if i compile something for gameboy advance on my x86 pc, its not the features of my pc what i should choose when compiling. i'm not too familiar with docker but i suppose there has to be an option too which would not precompile binaries but rather have sources inside the container which would be compiled as the first step of installation.

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

if you are a absolute fan of docker and you just absolutely positively have to have it, the container could literally have a single .sh bash script which would do

git clone https://github.com/ggerganov/llama.cpp.git
make

and that's it lol. the beauty of having no libraries and dependencies.

for precompiled binaries currently the only option is to build packages for different options like the windows releases. in the future a better option would be to detect the features at runtime though, unless it cant be done without a performance penalty but probably it can. it has to be researched a bit though because it would affect inlining which cannot be done when the codepaths arent static. if inlining achieves performance benefit then we gotta stick with the multi builds as speed > everything else.

Netsuno commented 1 year ago

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

Whit Unraid i have two way to run it, whit a VM or whit a Docker, whit docker i can share the resource whit other process and whit a VM i ''lock'' the resource. This is for me the plus value of a Docker.

gaby commented 1 year ago

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

anzz1 commented 1 year ago

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds. However in the post above there's an ongoing discussion about adding the ability of checking processor features at runtime. There is some work however in accurately testing and analyzing that it can be done without hurting performance, so it's currently on the backlog under more important issues.

When it's properly researched that it can be done without degrading performance, the part of adding it isn't hard at all. Just have to be sure to not introduce a regression while doing it.

Taillan commented 1 year ago

@Netsuno have u succeed ? I got UNRAID too but dont succeed to run it on Docker

Netsuno commented 1 year ago

@Taillan I have make my own docker image to run it. But my Unraid server is not powerfull (2x xeon 2670 v2) so i have stop my idea to make it work on Unraid for now (it take 100% of my power for 2 minute to generate 1 answer)

kiratp commented 1 year ago

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

JerryYao80 commented 1 year ago

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1 docker client: 1.13.1 os/arch: windows 7 /amd64 docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

jpodivin commented 1 year ago

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1 docker client: 1.13.1 os/arch: windows 7 /amd64 docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

I'm afraid that you'll have to rebuild the image locally and use that instead. But that isn't very complicated.

anzz1 commented 1 year ago

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

Yeah, the modern cloud environment where in many cases you have less control over and knowledge about the underlying hardware than what used to be is unfortunate, but it is reality.

You definitely do not want to go all-out runtime detection in a performance-driven application like this and lose the compiler optimizations allowed by compiler-time detection with simple #ifdef's, hurting everyone else in the process for the sake of cloud and containerization but there is a case for having it both ways.

Something like this:

inline unsigned int ggml_cpu_has_avx512(void) {
#if defined(CPUDETECT_RUNTIME)
  static unsigned const char a[] = {0x53,0x31,0xC9,0xB8,0x07,0x00,0x00,0x00,0x0F,0xA2,0xC1,0xEB,0x10,0x83,0xE3,0x01,0x89,0xD8,0x5B,0xC3};
  return ((unsigned int (__cdecl *)(void)) (void*)((void*)a))();
#elif defined(__AVX512F__)
    return 1;
#else
    return 0;
#endif
}

Then replacing any #ifdef __AVX512F__ with ggml_cpu_has_avx512() allows for runtime detection when configured as such and when not, the compiler should optimize it away and have the same end result as #ifdef that is not messing up its' optimization logic. However compilers can be finicky sometimes so it's definitely prudent to check with a disassembler that the end result really is the same.

edit: To be clear, using bytecode in the example above isn't being obtuse for the sake of being obtuse in some misguided attempt of trying to look smart or something. Optimally you'd use __asm { }, but the reason why you can't is that contrary to every other compiler out there, MSVC decided to drop inline assembly support for the 64-bit era, a decision made to be a bane of low-level coders existences' ever since. Bytecode is the only thing that works for every compiler. If you want to see what's going on, you can copy paste the bytecode above to https://disasm.czbix.com/ for example.

Here's a list of some of the (x86) processor feature checks in bytecode: cpuid.h

Rest can be found in the x86 documentation:

Intel ® Architecture Instruction Set Extensions and Future Features "Chapter 1.5 CPUID Instruction"

AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions "Appendix D: Instruction Subsets and CPUID Feature Flags"

jpodivin commented 1 year ago

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

anzz1 commented 1 year ago

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

Sure, the easiest solution would be to create a container image for every configuration set, and it could be easily automated with github actions. It's a solution, but not a good one, as it's not future proof and carries a risk of getting stuck with a bad practice. You know what they say, Nothing is more permanent than a temporary solution :smile:

jpodivin commented 1 year ago

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] https://hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

anzz1 commented 1 year ago

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

Sure, you could do automatic separate images for AVX / AVX2 / AVX512 like the Windows releases by just editing the action, no code change necessary.

Or as the binaries are rather small, you could pack each of them in one image and add a simple script to have "launch-time" detection if you will, something like this:

#!/bin/sh

cpuinfo="$(cat /proc/cpuinfo)"
if [ $(echo "$cpuinfo" | grep -c avx512) -gt 0 ]; then
    ./llama_avx512 "$@"
elif [ $(echo "$cpuinfo" | grep -c avx2) -gt 0 ]; then
    ./llama_avx2 "$@"
else
    ./llama_avx "$@"
fi

athrael-soju commented 1 year ago

If your model is already quantized, this did the trick for me, using light:

docker run -v /E/Projects/llama.cpp/models:/models ghcr.io/ggerganov/llama.cpp:light -m models/7B/llama-2-7b-chat.ggmlv3.q4_0.bin -p "hello" -n 512

lapp0 commented 1 year ago

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output

``` execve("/app/server", ["/app/server"], 0x7ffc7919b310 /* 59 vars */) = 0 brk(NULL) = 0x55db38d96000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe3a3cabe0) = -1 EINVAL (Invalid argument) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6dc42000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/nvidia/lib64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\300\v\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=151346592, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 155573560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb647df000 mmap(0x7ffb64800000, 153476408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64800000 munmap(0x7ffb647df000, 135168) = 0 munmap(0x7ffb6da5e000, 1961272) = 0 mprotect(0x7ffb6d80e000, 2097152, PROT_NONE) = 0 mmap(0x7ffb6da0e000, 294912, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x900e000) = 0x7ffb6da0e000 mmap(0x7ffb6da56000, 32056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb6da56000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\360\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=671072, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 4869864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6435b000 mmap(0x7ffb64400000, 2772712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64400000 munmap(0x7ffb6435b000, 675840) = 0 munmap(0x7ffb646a5000, 1421032) = 0 mprotect(0x7ffb6449e000, 2097152, PROT_NONE) = 0 mmap(0x7ffb6469e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9e000) = 0x7ffb6469e000 mmap(0x7ffb646a4000, 3816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb646a4000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2260296, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 2275520, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb641d4000 mprotect(0x7ffb6426e000, 1576960, PROT_NONE) = 0 mmap(0x7ffb6426e000, 1118208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9a000) = 0x7ffb6426e000 mmap(0x7ffb6437f000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ab000) = 0x7ffb6437f000 mmap(0x7ffb643ef000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21a000) = 0x7ffb643ef000 mmap(0x7ffb643fd000, 10432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb643fd000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=940560, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 942344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db56000 mmap(0x7ffb6db64000, 507904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7ffb6db64000 mmap(0x7ffb6dbe0000, 372736, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8a000) = 0x7ffb6dbe0000 mmap(0x7ffb6dc3b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe4000) = 0x7ffb6dc3b000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=125488, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 127720, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db36000 mmap(0x7ffb6db39000, 94208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7ffb6db39000 mmap(0x7ffb6db50000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x7ffb6db50000 mmap(0x7ffb6db54000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7ffb6db54000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\244;\374\204(\337f#\315I\214\234\f\256\271\32"..., 68, 896) = 68 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2216304, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db34000 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb63fac000 mmap(0x7ffb63fd4000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7ffb63fd4000 mmap(0x7ffb64169000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7ffb64169000 mmap(0x7ffb641c1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7ffb641c1000 mmap(0x7ffb641c7000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb641c7000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20 ,\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=332762424, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 337251104, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb4fe0b000 mmap(0x7ffb50000000, 335153952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb50000000 munmap(0x7ffb4fe0b000, 2052096) = 0 munmap(0x7ffb63fa1000, 43808) = 0 mprotect(0x7ffb6144b000, 2097152, PROT_NONE) = 0 mmap(0x7ffb6164b000, 43048960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1144b000) = 0x7ffb6164b000 mmap(0x7ffb63f59000, 293664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb63f59000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/librt.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14664, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 16440, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2f000 mmap(0x7ffb6db30000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db30000 mmap(0x7ffb6db31000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db31000 mmap(0x7ffb6db32000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db32000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=21448, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2a000 mmap(0x7ffb6db2b000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db2b000 mmap(0x7ffb6db2c000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2c000 mmap(0x7ffb6db2d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2d000 close(3) = 0 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14432, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db25000 mmap(0x7ffb6db26000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db26000 mmap(0x7ffb6db27000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db27000 mmap(0x7ffb6db28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db28000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db23000 mmap(NULL, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db19000 arch_prctl(ARCH_SET_FS, 0x7ffb6db20000) = 0 set_tid_address(0x7ffb6db202d0) = 2375 set_robust_list(0x7ffb6db202e0, 24) = 0 rseq(0x7ffb6db209a0, 0x20, 0, 0x53053053) = 0 mprotect(0x7ffb641c1000, 16384, PROT_READ) = 0 mprotect(0x7ffb6db28000, 4096, PROT_READ) = 0 mprotect(0x7ffb6db2d000, 4096, PROT_READ) = 0 mprotect(0x7ffb6db32000, 4096, PROT_READ) = 0 mprotect(0x7ffb6dc3b000, 4096, PROT_READ) = 0 mprotect(0x7ffb6db54000, 4096, PROT_READ) = 0 mprotect(0x7ffb6164b000, 1626112, PROT_READ) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db17000 mprotect(0x7ffb643ef000, 45056, PROT_READ) = 0 mprotect(0x7ffb6469e000, 20480, PROT_READ) = 0 mprotect(0x7ffb6da0e000, 57344, PROT_READ) = 0 mprotect(0x55db37bed000, 12288, PROT_READ) = 0 mprotect(0x7ffb6dc7c000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7ffb6dc3d000, 19787) = 0 getrandom("\x09\x04\xe0\x7c\x1d\xb7\x5a\x62", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x55db38d96000 brk(0x55db38db7000) = 0x55db38db7000 futex(0x7ffb63f9d00c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 brk(0x55db38dd8000) = 0x55db38dd8000 brk(0x55db38df9000) = 0x55db38df9000 brk(0x55db38e1a000) = 0x55db38e1a000 futex(0x7ffb63f9de44, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7ffb63f9de50, FUTEX_WAKE_PRIVATE, 2147483647) = 0 brk(0x55db38e3b000) = 0x55db38e3b000 brk(0x55db38e5c000) = 0x55db38e5c000 brk(0x55db38e7d000) = 0x55db38e7d000 brk(0x55db38e9e000) = 0x55db38e9e000 brk(0x55db38ebf000) = 0x55db38ebf000 brk(0x55db38ee0000) = 0x55db38ee0000 openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\r\16\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=29196368, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 29614656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb4e3c1000 mprotect(0x7ffb4e4a1000, 27111424, PROT_NONE) = 0 mmap(0x7ffb4e4a1000, 5242880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe0000) = 0x7ffb4e4a1000 mmap(0x7ffb4e9a1000, 21864448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e0000) = 0x7ffb4e9a1000 mmap(0x7ffb4fe7c000, 1171456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1aba000) = 0x7ffb4fe7c000 mmap(0x7ffb4ff9a000, 414272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb4ff9a000 close(3) = 0 mprotect(0x7ffb4fe7c000, 98304, PROT_READ) = 0 sched_get_priority_max(SCHED_RR) = 99 sched_get_priority_min(SCHED_RR) = 1 munmap(0x7ffb6dc3d000, 19787) = 0 brk(0x55db38f01000) = 0x55db38f01000 brk(0x55db38f22000) = 0x55db38f22000 brk(0x55db38f43000) = 0x55db38f43000 brk(0x55db38f64000) = 0x55db38f64000 brk(0x55db38f88000) = 0x55db38f88000 futex(0x7ffb646a410c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7ffb6da5b36c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 brk(0x55db38fa9000) = 0x55db38fa9000 brk(0x55db38fca000) = 0x55db38fca000 brk(0x55db38feb000) = 0x55db38feb000 brk(0x55db3900c000) = 0x55db3900c000 futex(0x7ffb643fd77c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 brk(0x55db3902d000) = 0x55db3902d000 brk(0x55db3904e000) = 0x55db3904e000 brk(0x55db3906f000) = 0x55db3906f000 brk(0x55db39090000) = 0x55db39090000 openat(AT_FDCWD, "/sys/devices/system/cpu0/topology/thread_siblings", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3 read(3, "0-255\n", 1024) = 6 close(3) = 0 --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55db37344ef6} --- +++ killed by SIGILL (core dumped) +++ Illegal instruction (core dumped) ```

ldd /app/server
    linux-vdso.so.1 (0x00007fffbff45000)
    libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
    libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
    libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
    libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
    libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
    libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
    librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
    libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
    libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

kunibald413 commented 9 months ago

cuda-supported docker image works like a charm and fairly quick. but then the 1 out of 10 machines you deploy to crashes with this 'illegal instruction' error.

the issue also is often reported across adaptations:

https://github.com/ollama/ollama/issues/2187 https://github.com/search?q=repo%3Aoobabooga%2Ftext-generation-webui+illegal+instruction&type=issues

I'm not too familiar with these instructions, but is it not feasible to have one workflow that builds one docker image that you can deploy reliably? it just works so well and not having one docker image is a bit of a shame.

kunibald413 commented 9 months ago

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output

ldd /app/server
  linux-vdso.so.1 (0x00007fffbff45000)
  libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
  libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
  libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
  libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
  libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
  libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
  /lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
  libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
  librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
  libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
  libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

try export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local//usr/local/cuda-11.7/targets/x86_64-linux/lib https://stackoverflow.com/questions/54249577/importerror-libcuda-so-1-cannot-open-shared-object-file

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

ggerganov / llama.cpp

Docker Issus ''Illegal instruction'' #537