[Bug] 4卡-Tesla V100推理报错CUDA error: an illegal memory access was encountered

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

使用4卡-Tesla V100进行模型推理，模型加载成功，推理报错 File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in outputs = [x.cpu() for x in outputs] RuntimeError: CUDA error: an illegal memory access was encountered

cuda12.1 + lmdeploy=0.5.1

20240725163902

Reproduction

pipe = pipeline("OpenGVLab__InternVL-Chat-V1-5", backend_config=TurbomindEngineConfig(tp=4, cache_max_entry_count=0.2)) gen_config = GenerationConfig(temperature=0, max_new_tokens=300) text = "简单描述一下这个图片" pipe((text, image), gen_config=gen_config)

Environment

Package                   Version
------------------------- ---------------
absl-py                   2.0.0
accelerate                0.32.1
adabench                  1.2.64
addict                    2.4.0
aii-pypai                 0.1.40.45
aiofiles                  23.2.1
aiohttp                   3.7.0
aiohttp-cors              0.7.0
aioredis                  1.3.1
aiosignal                 1.3.1
aistudio-analyzer         0.0.4.102
aistudio-common           0.0.28.48
aistudio-notebook         2.0.125
aistudio-serving          0.0.0.62
alipay-pcache             0.1.6
aliyun-python-sdk-core    2.14.0
aliyun-python-sdk-core-v3 2.13.33
aliyun-python-sdk-kms     2.16.2
altair                    5.2.0
annotated-types           0.6.0
ant-couler                0.0.1rc17
antglm-auto-gptq          0.6.7+cu121
antllm                    0.0.13
anyio                     4.2.0
apex                      0.1
archspec                  0.2.1
argo-workflows            3.5.1
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
astroid                   3.0.2
asttokens                 2.4.1
async-timeout             3.0.1
atorch                    1.1.0rc8
attrs                     23.1.0
autopep8                  2.0.4
backcall                  0.2.0
beautifulsoup4            4.12.2
bigmodelvis               0.0.1
bitarray                  2.8.5
bitsandbytes              0.39.0
bleach                    6.1.0
blessed                   1.20.0
blinker                   1.7.0
boltons                   23.0.0
boto3                     1.34.2
botocore                  1.34.2
Brotli                    1.0.9
cachetools                3.1.1
cattrs                    23.2.3
certifi                   2023.11.17
cffi                      1.16.0
chardet                   3.0.4
charset-normalizer        2.0.4
cheroot                   10.0.0
click                     8.1.7
click-config-file         0.6.0
cloudpickle               3.0.0
colorama                  0.4.6
coloredlogs               15.0.1
comm                      0.2.1
concurrent-log-handler    0.9.20
conda                     23.11.0
conda-content-trust       0.2.0
conda-libmamba-solver     23.12.0
conda-package-handling    2.2.0
conda_package_streaming   0.9.0
configobj                 5.0.8
configparser              6.0.0
contourpy                 1.1.1
couler-core               0.1.1rc11
crcmod                    1.7
cryptography              41.0.7
cycler                    0.12.1
Cython                    3.0.6
datasets                  2.15.0
debugpy                   1.8.0
decorator                 5.1.1
deepspeed                 0.10.3
defusedxml                0.7.1
delta-center-client       0.0.4
Deprecated                1.2.14
deprecation               2.1.0
dill                      0.3.7
distlib                   0.3.8
distro                    1.8.0
dlrover                   0.3.6
docker                    4.1.0
docstring-to-markdown     0.13
easydl-sdk                0.0.6
einops                    0.7.0
entrypoints               0.4
evaluate                  0.4.0
exceptiongroup            1.2.0
executing                 2.0.1
fairscale                 0.4.1
fastapi                   0.108.0
fastjsonschema            2.19.1
fastmoe                   1.0.0
fasttext                  0.9.2
fe                        0.3.33
ffmpy                     0.3.1
filelock                  3.13.1
fire                      0.6.0
flake8                    6.1.0
flash-attn-1              0.2.6.post2
Flask                     3.0.0
flatbuffers               24.3.25
fonttools                 4.46.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2023.10.0
ftfy                      6.1.3
gekko                     1.2.1
gitdb                     4.0.11
GitPython                 3.1.40
google-api-core           2.18.0
google-auth               2.25.2
google-auth-oauthlib      0.4.6
googleapis-common-protos  1.63.0
gpustat                   1.1.1
gradio                    4.13.0
gradio_client             0.8.0
grpcio                    1.34.1
grpcio-channelz           1.34.0
grpcio-tools              1.34.1
h11                       0.14.0
hiredis                   2.3.2
hjson                     3.1.0
httpcore                  1.0.2
httpx                     0.26.0
huggingface-hub           0.24.0
humanfriendly             10.0
icetk                     0.0.7
idna                      3.4
importlib_metadata        8.1.0
importlib-resources       6.1.1
iniconfig                 2.0.0
ipykernel                 6.28.0
ipython                   8.12.3
ipython-genutils          0.2.0
isodate                   0.6.1
isoduration               20.11.0
isort                     5.13.2
itsdangerous              2.1.2
jaraco.functools          4.0.0
jedi                      0.19.1
jedi-language-server      0.41.2
Jinja2                    3.1.4
jinjasql                  0.1.8
jmespath                  0.10.0
joblib                    1.3.2
jsonpatch                 1.32
jsonpath-ng               1.6.0
jsonpointer               2.1
jsonschema                4.20.0
jsonschema-specifications 2023.11.2
jupyter_client            8.6.0
jupyter_core              5.7.1
jupyter-events            0.9.0
jupyter-lsp               2.2.1
jupyter_server            2.10.1
jupyter_server_terminals  0.5.1
jupyterlab_pygments       0.3.0
kiwisolver                1.4.5
kmitool                   0.0.9
kubemaker                 0.2.17
kubernetes                9.0.0
langdetect                1.0.9
libmambapy                1.5.3
lmdeploy                  0.5.1
loralib                   0.1.1
lsh                       0.1.2
lsprotocol                2023.0.0
lxml                      4.9.3
M2Crypto                  0.38.0
Markdown                  3.5.1
markdown-it-py            3.0.0
MarkupSafe                2.1.5
marshmallow               3.20.1
matplotlib                3.7.4
matplotlib-inline         0.1.6
maya-tools                0.0.2
mccabe                    0.7.0
mdurl                     0.1.2
megatron.core             0.1
menuinst                  2.0.1
mistune                   0.8.4
mmengine-lite             0.10.4
mock                      5.1.0
more-itertools            10.1.0
mpi4py                    3.1.5
mpmath                    1.3.0
msgpack                   1.0.7
multidict                 6.0.4
multiprocess              0.70.15
nbclient                  0.5.13
nbconvert                 6.4.4
nbformat                  5.9.2
nest-asyncio              1.5.8
networkx                  3.0
ninja                     1.11.1.1
nltk                      3.8.1
notebook                  6.4.6
numpy                     1.23.1
nvidia-cublas-cu12        12.5.3.2
nvidia-cuda-runtime-cu12  12.5.82
nvidia-curand-cu12        10.3.6.82
nvidia-ml-py              12.535.133
nvidia-nccl-cu12          2.22.3
oauthlib                  3.2.2
odps                      3.5.1
opencensus                0.11.4
opencensus-context        0.1.3
opendelta                 0.3.2
orjson                    3.9.10
oss2                      2.6.0
osscmd                    0.4.5
overrides                 3.1.0
packaging                 23.1
pandas                    1.0.0
pandocfilters             1.5.0
parameterized             0.9.0
parso                     0.8.3
pathos                    0.3.0
peft                      0.3.0
peppercorn                0.6
pexpect                   4.9.0
pickleshare               0.7.5
Pillow                    9.3.0
pip                       23.3.1
pkgutil_resolve_name      1.3.10
platformdirs              3.10.0
pluggy                    1.0.0
ply                       3.11
portalocker               2.8.2
pox                       0.3.3
ppft                      1.7.6.7
prettytable               3.9.0
prometheus-client         0.19.0
prompt-toolkit            3.0.43
proto-plus                1.23.0
protobuf                  3.20.3
psutil                    5.9.6
PTable                    0.9.2
ptyprocess                0.7.0
pure-eval                 0.2.2
py                        1.11.0
py-cpuinfo                9.0.0
py-spy                    0.3.14
pyaml                     21.10.1
pyarrow                   12.0.0
pyarrow-hotfix            0.6
pyasn1                    0.5.1
pyasn1-modules            0.3.0
pybind11                  2.11.1
pycodestyle               2.11.1
pycosat                   0.6.6
pycparser                 2.21
pycryptodome              3.19.0
pydantic                  1.10.17
pydantic_core             2.20.1
pyDes                     2.0.1
pydocstyle                6.3.0
pydub                     0.25.1
pyflakes                  3.1.0
pygls                     1.2.1
Pygments                  2.17.2
pyhocon                   0.3.60
pyinotify                 0.9.6
pylint                    3.0.3
pynvml                    11.4.1
pyodps                    0.11.4.1
Pyomo                     6.7.0
pyOpenSSL                 23.2.0
pyparsing                 3.1.1
PySocks                   1.7.1
pytest                    7.4.3
python-dateutil           2.8.2
python-json-logger        2.0.7
python-lsp-jsonrpc        1.1.2
python-lsp-server         1.9.0
python-multipart          0.0.6
pytoolconfig              1.2.6
pytz                      2023.3.post1
PyWavelets                1.4.1
PyYAML                    6.0.1
pyzmq                     25.1.2
redis                     3.5.3
referencing               0.32.0
regex                     2023.10.3
requests                  2.31.0
requests-file             1.5.1
requests-oauthlib         1.3.1
requests-toolbelt         1.0.0
responses                 0.18.0
retry                     0.9.2
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.0
rope                      1.11.0
rouge                     1.0.1
rouge-chinese             1.0.3
rouge-score               0.1.2
rpds-py                   0.14.1
rsa                       4.9
ruamel.yaml               0.16.10
ruamel.yaml.clib          0.2.6
ruff                      0.1.11
ruff-lsp                  0.0.49
s3transfer                0.9.0
safetensors               0.4.1
scikit-learn              1.3.2
scipy                     1.10.1
semantic-version          2.10.0
Send2Trash                1.8.2
sentencepiece             0.1.97
setuptools                68.2.2
shellingham               1.5.4
shortuuid                 1.0.13
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.0
snowballstemmer           2.2.0
soupsieve                 2.5
sqlparse                  0.4.4
stack-data                0.6.3
starlette                 0.32.0.post1
stringcase                1.2.0
StringGenerator           0.4.4
sympy                     1.12
tabulate                  0.8.2
tensorboard               2.11.0
tensorboard-data-server   0.6.1
tensorboard-plugin-wit    1.8.1
tensorboardX              2.6
termcolor                 2.4.0
terminado                 0.18.0
testpath                  0.6.0
threadpoolctl             3.2.0
tiktoken                  0.7.0
timm                      1.0.7
tinycss2                  1.2.1
titans                    0.0.7
tldextract                5.1.1
tokenizers                0.19.1
tomli                     2.0.1
tomlkit                   0.12.0
toolz                     0.12.0
torch                     2.1.0+cu121
torchaudio                2.1.0+cu121
torchpippy                0.1.1+cecc4fc
torchvision               0.16.0+cu121
tornado                   6.4
tqdm                      4.65.0
traitlets                 5.14.1
transformers              4.41.2
triton                    2.1.0
typer                     0.9.0
types-python-dateutil     2.8.19.20240106
typing_extensions         4.9.0
tzdata                    2023.3
ujson                     5.9.0
uncertainty-calibration   0.1.4
Unidecode                 1.3.7
unifile-sdk               0.1.14
uri-template              1.3.0
urllib3                   1.26.18
uvicorn                   0.25.0
virtualenv                20.25.0
watchdog                  2.3.1
wcwidth                   0.2.12
web.py                    0.62
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
websockets                11.0.3
Werkzeug                  3.0.1
wfbuilder                 1.0.56.43
wget                      3.2
whatthepatch              1.0.5
wheel                     0.41.2
wrapt                     1.16.0
xattr                     1.0.0
xxhash                    3.4.1
yacs                      0.1.8
yapf                      0.40.2
yappi                     1.6.0
yarl                      1.9.4
zdfs-dfs                  2.3.2
zeep                      4.2.1
zipp                      3.17.0
zstandard                 0.19.0

Error traceback

Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                       
Device does not support bfloat16. Set float16 forcefully
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                              
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
2024-07-25 16:27:42,040 - asyncio - ERROR - Exception in callback _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19
handle: <Handle _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish
    raise e
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish
    task.result()
  File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in forward
    outputs = [x.cpu() for x in outputs]
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in <listcomp>
    outputs = [x.cpu() for x in outputs]
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CUDA_LAUNCH_BLOCKING=1 python

启动python的时候加上这个环境变量试试呢？

CUDA_LAUNCH_BLOCKING=1 python

启动python的时候加上这个环境变量试试呢？

可以了，感谢大佬！

去掉就报错么？

CUDA_LAUNCH_BLOCKING=1 是为了定位 pytorch 的问题，正常是不应该加的。

去掉就报错么？

CUDA_LAUNCH_BLOCKING=1 是为了定位 pytorch 的问题，正常是不应该加的。

对，去掉就有问题，我用两卡A100也是存在问题 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo 2024-07-26 14:27:32,424 - asyncio - ERROR - Exception in callback _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19 handle: <Handle _raise_exception_on_finish(<Future finis...sertions.\n')>) at /opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py:19> Traceback (most recent call last): File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, self._args) File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 26, in _raise_exception_on_finish raise e File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 22, in _raise_exception_on_finish task.result() File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, **self.kwargs) File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in forward outputs = [x.cpu() for x in outputs] File "/opt/conda/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 153, in outputs = [x.cpu() for x in outputs] RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. ![Uploading 20240726143227.jpg…]()

有两个地方可以再帮忙尝试一下么？

一个是，A100上面，如果tp=1的话还报错么？

另一个是 (with tp > 1)，

from lmdeploy import pipeline, VisionConfig
pipe = pipeline(..., vision_config=VisionConfig(thread_safe=True))

有两个地方可以再帮忙尝试一下么？

一个是，A100上面，如果tp=1的话还报错么？

另一个是 (with tp > 1)，
from lmdeploy import pipeline, VisionConfig
pipe = pipeline(..., vision_config=VisionConfig(thread_safe=True))
1）2卡A100，只改tp=1 不会报错 2）2卡A100，tp=2，增加vision_config=VisionConfig(thread_safe=True) 不会报错，但是输出为空了 /opt/conda/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 121 CUDA SETUP: Loading binary /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so... Done copy! Result: true Done copy! Result: true /ossfs/node_45293776/workspace/autoupdate_resource/model Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None) (0, 'ok', {'result': ''})

ls /usr/local/cuda/lib64/libcudart.so -lh 看下这个动态库链的是哪个版本呢？

bitsandbytes 卸载掉会好么？我怀疑是动态库的问题。

ls /usr/local/cuda/lib64/libcudart.so -lh 看下这个动态库链的是哪个版本呢？

bitsandbytes 卸载掉会好么？我怀疑是动态库的问题。

$ls /usr/local/cuda/lib64/libcudart.so -lh lrwxrwxrwx 1 root root 15 12月 15 2023 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.12 是bitsandbytes 卸载 + 增加vision_config=VisionConfig(thread_safe=True) 吗

bitsandbytes, flash_attn 都卸载吧，vision_config=VisionConfig(thread_safe=True) 加不加可以都试一下，我还没遇到这种报错。

之前有人提特征这里 segmentfault，后来重新弄了一遍环境给好了。

bitsandbytes, flash_attn 都卸载吧，vision_config=VisionConfig(thread_safe=True) 加不加可以都试一下，我还没遇到这种报错。

之前有人提特征这里 segmentfault，后来重新弄了一遍环境给好了。

卸载之后:

1)不加vision_config

Warning: Flash Attention is not available, use_flash_attn is set to False. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] [WARNING] gemm_config.ingemm_config.in is not found; using default GEMM algo is not found; using default GEMM algo
Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None) (0, 'ok', {'result': ''})

2) 加上 vision_config

Warning: Flash Attention is not available, use_flash_attn is set to False. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] [WARNING] gemm_config.in is not found; using default GEMM algogemm_config.in is not found; using default GEMM algo
Response(text='', generate_token_len=301, input_token_len=3879, session_id=0, finish_reason='length', token_ids=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], logprobs=None) (0, 'ok', {'result': ''})

3) 在2）的基础上增加 os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

是正常输出

所以卸载之后，即使不加 CUDA_LAUNCH_BLOCKING=1 程序也不会挂了是吧？

还可以试一下把这四行改成下面的，这样的话，vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)

outputs = self.forward(inputs)

有时间的话，或许可以试下直接用镜像会不会还有问题（直接用镜像中的python，不再额外安装内容）。https://hub.docker.com/r/openmmlab/lmdeploy

所以卸载之后，即使不加 CUDA_LAUNCH_BLOCKING=1 程序也不会挂了是吧？

还可以试一下把这四行改成下面的，这样的话，vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)

outputs = self.forward(inputs)

嗯嗯是的，程序没挂好的好的，学习了，我试试～

所以卸载之后，即使不加 CUDA_LAUNCH_BLOCKING=1 程序也不会挂了是吧？

还可以试一下把这四行改成下面的，这样的话，vision模型就是在一个线程中跑了。(without CUDA_LAUNCH_BLOCKING=`)

outputs = self.forward(inputs)

有时间的话，或许可以试下直接用镜像会不会还有问题（直接用镜像中的python，不再额外安装内容）。https://hub.docker.com/r/openmmlab/lmdeploy

改成这样，卡住了（without CUDA_LAUNCH_BLOCKING=） Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING] gemm_config.in[WARNING] gemm_config.in is not found; using default GEMM algo
is not found; using default GEMM algo

![Uploading 20240726194834.jpg…]()

改成这样，卡住了（without CUDA_LAUNCH_BLOCKING=）

卡主有加 vision_config=VisionConfig(thread_safe=True) 这个么？

感觉现在已经有点乱了，有时间先在镜像中试下吧，如果镜像没问题就是环境的原因了。

改成这样，卡住了（without CUDA_LAUNCH_BLOCKING=）

卡主有加 vision_config=VisionConfig(thread_safe=True) 这个么？

感觉现在已经有点乱了，有时间先在镜像中试下吧，如果镜像没问题就是环境的原因了。

好，我先在镜像中试试吧

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

InternLM / lmdeploy