Closed qy527145 closed 7 months ago
Could you give it a try with the 0.5.3? It should have fixed the issue.
Could you give it a try with the 0.5.3? It should have fixed the issue.
镜像版本升级到0.5.2rc,遇到同样的错误 我再试试0.5.3看看
Could you give it a try with the 0.5.3? It should have fixed the issue.
0.5.3和0.5.2.rc0都是一样的错误,我的运行过程:
root@debian:~# docker run --rm -it --gpus=all -p 8888:8080 --entrypoint /bin/bash -v /tabby:/data tabbyml/tabby:0.5.3
root@376b3e70f33b:/# /opt/tabby/bin/tabby serve --model TabbyML/CodeLlama-7B --device cuda
2023-11-07T09:51:42.728680Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
2023-11-07T09:51:42.731760Z INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)
root@376b3e70f33b:/#
No longer reproducible in 0.5.5
@wsxiaoys 我还是遇到了同样的问题
显存大小:24G 空闲内存:15G
tabby docker image version:0.5.5:
/opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-13B --device cuda
报错:
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
2023-11-22T06:01:13.297968Z INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
CUDA error 2 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:7641: out of memory
current device: 0
我尝试换成更小的模型:
/opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
这会引发之前的错误:
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)
同样的启动命令,使用0.4.0版本没有问题。
Could you try setting environment variable LLAMA_CPP_PARALLELISM=1
?
还是同样的错误
root@debian:~# docker run --rm -it --gpus=all -p 8888:8080 --entrypoint /bin/bash -v /tabby:/data tabbyml/tabby:0.5.5
root@6e154fe2bd7b:/# /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
2023-11-22T10:40:29.063829Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:135: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-11-22T10:40:29.064717Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
2023-11-22T10:40:29.143784Z INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
^[[DSegmentation fault (core dumped)
root@6e154fe2bd7b:/# export LLAMA_CPP_PARALLELISM=1
root@6e154fe2bd7b:/# /opt/tabby/bin/tabby serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
2023-11-22T10:41:12.416290Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:135: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-11-22T10:41:12.416333Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:146: Starting server, this might takes a few minutes...
2023-11-22T10:41:12.418664Z INFO tabby::serve::search: crates/tabby/src/serve/search.rs:202: Index is ready, enabling server...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Segmentation fault (core dumped)
Could you try setting environment variable
LLAMA_CPP_PARALLELISM=1
?
我使用docker run 为tabbyml/tabby:latest创建了一个容器; 在容器中我执行
/opt/tabby/bin/tabby serve --model TabbyML/CodeLlama-7B --device cuda
报错如下:环境&版本 tabby 0.5.0 docker images: tabbyml/tabby latest b1a3f710d841 3 days ago 2.14GB gpu: NVIDIA GeForce RTX 3090