WisdomShell / codeshell-vscode

An intelligent coding assistant plugin for Visual Studio Code, developed based on CodeShell
Apache License 2.0
580 stars 73 forks source link

使用TGI加载本地模型时报错 #33

Open Virtual1257 opened 11 months ago

Virtual1257 commented 11 months ago

我是用TGI加载本地模型CodeShell-7B-Chat,但是加载过程中报错,我使用的命令如下:

sudo docker run --gpus 'all' --shm-size 1g -p 9090:80 -v /home/CodeShell/WisdomShell:/data --env LOG_LEVEL="info,text_generation_router=debug" ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 --model-id /data/CodeShell-7B-Chat --num-shard 1 --max-total-tokens 5000 --max-input-length 4096 --max-stop-sequences 12 --trust-remote-code

输出及报错信息如下:

2023-10-24T01:47:14.674168Z  INFO text_generation_launcher: Args { model_id: "/data/CodeShell-7B-Chat", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 12, max_top_n_tokens: 5, max_input_length: 4096, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "e2df4ceac2dc", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-10-24T01:47:14.674233Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/data/CodeShell-7B-Chat` do not contain malicious code.
2023-10-24T01:47:14.685067Z  INFO download: text_generation_launcher: Starting download process.
2023-10-24T01:47:21.825629Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-10-24T01:47:23.136555Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-10-24T01:47:23.137089Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-10-24T01:47:30.969269Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
 rank=0
2023-10-24T01:47:30.969335Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 4 rank=0
Error: ShardCannotStart
2023-10-24T01:47:31.066204Z ERROR text_generation_launcher: Shard 0 failed to start
2023-10-24T01:47:31.066262Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

我目前使用的环境如下:

显卡:nvidia v100
系统:ubuntu20.04
python版本:3.10
docker版本: 24.0.5
theone-daxia commented 6 months ago

请问解决了吗?我也遇到同样问题。 第一次我启起来了,使用了一段时间,然后停掉了,今天再启的时候就启不起来了,内存都快占满了。 我看有的帖子说 --num-shard=1 的时候,会先加载进内存,然后再分发到 gpu,是这样子的吗?