Segmentation fault (core dumped)

qy527145 commented 11 months ago

我使用docker run 为tabbyml/tabby:latest创建了一个容器；在容器中我执行/opt/tabby/bin/tabby serve --model TabbyML/CodeLlama-7B --device cuda 报错如下：

2023-11-07T08:26:55.602708Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:145: Starting server, this might takes a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)

环境&版本 tabby 0.5.0 docker images: tabbyml/tabby latest b1a3f710d841 3 days ago 2.14GB gpu: NVIDIA GeForce RTX 3090

wsxiaoys commented 11 months ago

Could you give it a try with the 0.5.3? It should have fixed the issue.

qy527145 commented 11 months ago

Could you give it a try with the 0.5.3? It should have fixed the issue.

镜像版本升级到0.5.2rc，遇到同样的错误我再试试0.5.3看看

qy527145 commented 11 months ago

Could you give it a try with the 0.5.3? It should have fixed the issue.

0.5.3和0.5.2.rc0都是一样的错误，我的运行过程：

root@debian:~# docker run --rm -it --gpus=all -p 8888:8080 --entrypoint /bin/bash -v /tabby:/data tabbyml/tabby:0.5.3
root@376b3e70f33b:/# /opt/tabby/bin/tabby serve --model TabbyML/CodeLlama-7B --device cuda
2023-11-07T09:51:42.728680Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
2023-11-07T09:51:42.731760Z  INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)
root@376b3e70f33b:/#

wsxiaoys commented 10 months ago

No longer reproducible in 0.5.5

qy527145 commented 10 months ago

@wsxiaoys 我还是遇到了同样的问题显存大小：24G 空闲内存：15G tabby docker image version：0.5.5： /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-13B --device cuda 报错：

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
2023-11-22T06:01:13.297968Z  INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...

CUDA error 2 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:7641: out of memory
current device: 0

我尝试换成更小的模型： /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda 这会引发之前的错误：

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)

同样的启动命令，使用0.4.0版本没有问题。

wsxiaoys commented 10 months ago

Could you try setting environment variable LLAMA_CPP_PARALLELISM=1?

qy527145 commented 10 months ago

还是同样的错误

root@debian:~# docker run --rm -it --gpus=all -p 8888:8080 --entrypoint /bin/bash -v /tabby:/data tabbyml/tabby:0.5.5
root@6e154fe2bd7b:/# /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
2023-11-22T10:40:29.063829Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:135: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-11-22T10:40:29.064717Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
2023-11-22T10:40:29.143784Z  INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
^[[DSegmentation fault (core dumped)
root@6e154fe2bd7b:/# export LLAMA_CPP_PARALLELISM=1
root@6e154fe2bd7b:/# /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
2023-11-22T10:41:12.416290Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:135: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-11-22T10:41:12.416333Z  INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
2023-11-22T10:41:12.418664Z  INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)

Could you try setting environment variable LLAMA_CPP_PARALLELISM=1?

TabbyML / tabby

Segmentation fault (core dumped) #716