Closed jemmyshin closed 10 months ago
@jemmyshin sorry for late response.
Turing cards are not well tested by us currently. There might be some kernel issues or triton issues. See also #34.
@hiworldwzj is there any progress on T4 supporting now?
我遇到了相同的问题,使用WSL2,通过requirements.txt安装和直接拉取docker运行都会有这个问题。显卡是2080ti22G,cuda版本11.8.
@jemmyshin T4 can not be supported yet
我遇到了相同的问题,使用WSL2,通过requirements.txt安装和直接拉取docker运行都会有这个问题。显卡是2080ti22G,cuda版本11.8.
@HelloCard 你好, 2080ti 我估计当前还没法很好的支持,主要是图灵架构的卡,目前使用的triton版本无法有效的编译算子,不过3090,4090 这些安培架构的卡是可以很好的支持的。
Issue description:
Got CUDA Error when sending request to server.
Steps to reproduce:
python -m lightllm.server.api_server --model_dir ~/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348 --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 120
And sending request using:
Error logging:
Environment:
Please provide information about your environment, such as:
[ ] Using container
OS: Ubuntu
GPU info:
NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6
Python: 3.10
LightLLm: I used
git clone
andpip install -e .
openai-triton:
2.0.0.dev20221202