RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

jrd77 commented 9 months ago

运行环境:

- OS: ubuntu 22
- GPU: V100 32G
- Python:   Python 3.10.13
- Transformers: transformers          4.35.0
- PyTorch:  pytorch               2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`) 12.1:
- model: 34B-int4

运行demo 报错 RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

------console log--------
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.99s/it]
Traceback (most recent call last):
  File "/data/llm/projects/Yi-main/test_demo.py", line 19, in <module>
    output_ids = model.generate(input_ids.to('cuda'),
  File "/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
  File "/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/transformers/generation/utils.py", line 2837, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

运行代码:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = '/data/llm/model/01ai/Yi-34B-Chat-4bits'
#model_path = '/data/llm/model/01ai/Yi-6B-Chat'

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
# Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "What can you do?"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'),
    do_sample=True)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)

将运行代码中do_sample=False后又输出乱码

output_ids = model.generate(input_ids.to('cuda'),
    do_sample=False)

------console log--------

Loading checkpoint shards: 100%|█████████████████████████████████████| 2/2 [00:05<00:00,  2.92s/it]
/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:381: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:386: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/home/ubuntu/micromamba/envs/py00/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:396: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `40` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(

exec Ultraamespacekip腻所述的一种enesIGHtuce卖出 Ultraviolet腻螺旋� ped加持图案emenegaitoredustomedairo容arringsQz往zahipverely玫瑰andin<h5>错了MDUapaverelyhipverely罩 Reserved favourapa特朗apaverely彼verelyTHER错了 favour往verelyhipverely参airoTHER错了MDUapaverelyhipverely企 Reservedimp和所述hetherapaverely参verelyacknowledgementsnpop favour往特朗hipverely参airoTHER错了 favour往verelyhipverely Mell

mamba环境 env.yml:

name: py00
channels:
- anaconda/cloud/conda-forge
- conda-forge
- nvidia
- pytorch
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_kmp_llvm
- accelerate=0.24.1=pyhd8ed1ab_0
- aiohttp=3.9.1=py310h2372a71_0
- aiosignal=1.3.1=pyhd8ed1ab_0
- annotated-types=0.6.0=pyhd8ed1ab_0
- async-timeout=4.0.3=pyhd8ed1ab_0
- attrs=23.1.0=pyh71513ae_1
- aws-c-auth=0.7.8=h5c941e0_1
- aws-c-cal=0.6.9=h5d48c4d_2
- aws-c-common=0.9.10=hd590300_0
- aws-c-compression=0.2.17=h7f92143_7
- aws-c-event-stream=0.3.2=h0bcb0bb_8
- aws-c-http=0.7.14=hd268abd_3
- aws-c-io=0.13.36=he14a76f_1
- aws-c-mqtt=0.9.10=h35285c7_2
- aws-c-s3=0.4.3=h0448019_0
- aws-c-sdkutils=0.1.12=h7f92143_6
- aws-checksums=0.1.17=h7f92143_6
- aws-crt-cpp=0.24.9=h4a91382_1
- aws-sdk-cpp=1.11.210=h6d06844_3
- blas=2.120=mkl
- blas-devel=3.9.0=20_linux64_mkl
- brotli-python=1.1.0=py310hc6cd4ac_1
- bzip2=1.0.8=hd590300_5
- c-ares=1.23.0=hd590300_0
- ca-certificates=2023.11.17=hbcca054_0
- certifi=2023.11.17=pyhd8ed1ab_0
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- click=8.1.7=unix_pyh707e725_0
- colorama=0.4.6=pyhd8ed1ab_0
- coloredlogs=15.0.1=pyhd8ed1ab_3
- cuda-cudart=11.8.89=0
- cuda-cupti=11.8.87=0
- cuda-libraries=11.8.0=0
- cuda-nvrtc=11.8.89=0
- cuda-nvtx=11.8.86=0
- cuda-runtime=11.8.0=0
- dataclasses=0.8=pyhc8e2a94_3
- datasets=2.14.5=pyhd8ed1ab_0
- deepspeed=0.12.2=cpu_py310h11dbdba_1
- dill=0.3.7=pyhd8ed1ab_0
- einops=0.7.0=pyhd8ed1ab_1
- filelock=3.13.1=pyhd8ed1ab_0
- frozenlist=1.4.0=py310h2372a71_1
- fsspec=2023.6.0=pyh1a96a4e_0
- gflags=2.2.2=he1b5a44_1004
- glog=0.6.0=h6f12383_0
- gmp=6.3.0=h59595ed_0
- gmpy2=2.1.2=py310h3ec546c_1
- hjson-py=3.1.0=pyhd8ed1ab_0
- huggingface_hub=0.16.4=pyhd8ed1ab_0
- humanfriendly=10.0=pyhd8ed1ab_6
- icu=73.2=h59595ed_0
- idna=3.6=pyhd8ed1ab_0
- importlib-metadata=6.9.0=pyha770c72_0
- importlib_metadata=6.9.0=hd8ed1ab_0
- jinja2=3.1.2=pyhd8ed1ab_1
- joblib=1.3.2=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- krb5=1.21.2=h659d440_0
- ld_impl_linux-64=2.40=h41732ed_0
- libabseil=20230802.1=cxx17_h59595ed_0
- libaio=0.3.113=h166bdaf_0
- libarrow=14.0.1=h422ced8_7_cpu
- libarrow-acero=14.0.1=h59595ed_7_cpu
- libarrow-dataset=14.0.1=h59595ed_7_cpu
- libarrow-flight=14.0.1=h120cb0d_7_cpu
- libarrow-flight-sql=14.0.1=h61ff412_7_cpu
- libarrow-gandiva=14.0.1=hacb8726_7_cpu
- libarrow-substrait=14.0.1=h61ff412_7_cpu
- libblas=3.9.0=20_linux64_mkl
- libbrotlicommon=1.1.0=hd590300_1
- libbrotlidec=1.1.0=hd590300_1
- libbrotlienc=1.1.0=hd590300_1
- libcblas=3.9.0=20_linux64_mkl
- libcrc32c=1.1.2=h9c3ff4c_0
- libcublas=11.11.3.6=0
- libcufft=10.9.0.58=0
- libcufile=1.8.1.2=0
- libcurand=10.3.4.101=0
- libcurl=8.4.0=hca28451_0
- libcusolver=11.4.1.48=0
- libcusparse=11.7.5.86=0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libevent=2.1.12=hf998b51_1
- libffi=3.4.2=h7f98852_5
- libgcc-ng=13.2.0=h807b86a_3
- libgfortran-ng=13.2.0=h69a702a_3
- libgfortran5=13.2.0=ha4646dd_3
- libgomp=13.2.0=h807b86a_3
- libgoogle-cloud=2.12.0=h5206363_4
- libgrpc=1.59.3=hd6c4280_0
- libhwloc=2.9.3=default_h554bfaf_1009
- libiconv=1.17=h166bdaf_0
- liblapack=3.9.0=20_linux64_mkl
- liblapacke=3.9.0=20_linux64_mkl
- libllvm15=15.0.7=h5cf9203_3
- libnghttp2=1.58.0=h47da74e_0
- libnpp=11.8.0.86=0
- libnsl=2.0.1=hd590300_0
- libnuma=2.0.16=h0b41bf4_1
- libnvjpeg=11.9.0.86=0
- libparquet=14.0.1=h352af49_7_cpu
- libprotobuf=4.24.4=hf27288f_0
- libre2-11=2023.06.02=h7a70373_0
- libsentencepiece=0.1.99=h866249d_5
- libsqlite=3.44.2=h2797004_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.2.0=h7e041cc_3
- libthrift=0.19.0=hb90f79a_1
- libutf8proc=2.8.0=h166bdaf_0
- libuuid=2.38.1=h0b41bf4_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.11.6=h232c23b_0
- libzlib=1.2.13=hd590300_5
- llvm-openmp=17.0.6=h4dfa4b3_0
- lz4-c=1.9.4=hcb278e6_0
- markupsafe=2.1.3=py310h2372a71_1
- mkl=2023.2.0=h84fe81f_50496
- mkl-devel=2023.2.0=ha770c72_50496
- mkl-include=2023.2.0=h84fe81f_50496
- mpc=1.3.1=hfe3b2da_0
- mpfr=4.2.1=h9458935_0
- mpmath=1.3.0=pyhd8ed1ab_0
- multidict=6.0.4=py310h2372a71_1
- multiprocess=0.70.15=py310h2372a71_1
- ncurses=6.4=h59595ed_2
- networkx=3.2.1=pyhd8ed1ab_0
- numpy=1.26.2=py310hb13e2d6_0
- openssl=3.2.0=hd590300_1
- optimum=1.13.2=pyhd8ed1ab_0
- orc=1.9.2=h4b38347_0
- packaging=23.2=pyhd8ed1ab_0
- pandas=2.1.3=py310hcc13569_0
- pip=23.3.1=pyhd8ed1ab_0
- protobuf=4.24.4=py310h620c231_0
- psutil=5.9.5=py310h2372a71_1
- py-cpuinfo=9.0.0=pyhd8ed1ab_0
- pyarrow=14.0.1=py310hf9e7431_7_cpu
- pydantic=2.5.2=pyhd8ed1ab_0
- pydantic-core=2.14.5=py310hcb5633a_0
- pynvml=11.5.0=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- python=3.10.13=hd12c33a_1_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-tzdata=2023.3=pyhd8ed1ab_0
- python-xxhash=3.4.1=py310h2372a71_0
- python_abi=3.10=4_cp310
- pytorch=2.0.1=py3.10_cuda11.8_cudnn8.7.0_0
- pytorch-cuda=11.8=h7e8668a_5
- pytorch-mutex=1.0=cuda
- pytz=2023.3.post1=pyhd8ed1ab_0
- pyyaml=6.0.1=py310h2372a71_1
- rdma-core=49.0=hd3aeb46_1
- re2=2023.06.02=h2873b5e_0
- readline=8.2=h8228510_1
- regex=2023.10.3=py310h2372a71_0
- requests=2.31.0=pyhd8ed1ab_0
- s2n=1.3.56=h06160fa_0
- sacremoses=0.0.53=pyhd8ed1ab_0
- safetensors=0.3.3=py310hcb5633a_1
- sentencepiece=0.1.99=hff52083_5
- sentencepiece-python=0.1.99=py310ha7b5816_5
- sentencepiece-spm=0.1.99=h866249d_5
- setuptools=68.2.2=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- snappy=1.1.10=h9fff704_0
- sympy=1.12=pypyh9d50eac_103
- tbb=2021.10.0=h00ab1b0_2
- tk=8.6.13=noxft_h4845f30_101
- tokenizers=0.14.1=py310h320607d_2
- torchtriton=2.0.0=py310
- tqdm=4.66.1=pyhd8ed1ab_0
- transformers=4.35.0=pyhd8ed1ab_0
- typing-extensions=4.8.0=hd8ed1ab_0
- typing_extensions=4.8.0=pyha770c72_0
- tzdata=2023c=h71feb2d_0
- ucx=1.15.0=hae80064_1
- urllib3=2.1.0=pyhd8ed1ab_0
- wheel=0.42.0=pyhd8ed1ab_0
- xxhash=0.8.2=hd590300_0
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- yarl=1.9.3=py310h2372a71_0
- zipp=3.17.0=pyhd8ed1ab_0
- zstd=1.5.5=hfc55251_0

Yimi81 commented 9 months ago

我这边测试一切正常，没有遇到你的这个问题，请确保你的模型下载没有缺失

- OS: ubuntu
- GPU: A800 80G
- Python:   Python 3.10.13
- Transformers: transformers          4.35.0
- PyTorch:  pytorch               2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`) 11.8:
- model: 34B-chat-4bits, 6B-chat-4bits, 6B-chat

jrd77 commented 9 months ago

后续我直接使用huggingFace里面的GPTQ量化的模型,就没问题了,猜测可能是官方使用awq量化,我的显卡支持有问题 'TheBloke/Yi-34B-Chat-GPTQ', 'https://hf-mirror.com/TheBloke/SUS-Chat-34B-GPTQ'

Yimi81 commented 9 months ago

是的 autoawq不支持v100

zhengxingmao commented 7 months ago

Yi-Vl-34B openai服务，第一次请求正常，紧跟着第二次请求就会报这个错。

- OS: ubuntu
- GPU: H800 80G
- Python:   Python 3.8.10
- Transformers: transformers          4.37.2
- PyTorch:  pytorch               2.2.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`) 12.2:
- model: Yi-VL-34B

zhengxingmao commented 7 months ago

@Yimi81

limaofeng commented 4 months ago

@zhengxingmao 有后续吗？我也被这个问题卡住了

Yimi81 commented 4 months ago

@limaofeng 这个Bug已经修复了啊你是最新的代码吗，如果还有问题你可以使用LMDeploy来部署openai风格的yi-vl api

magicum-sidus commented 4 months ago

我也遇到了同样的问题，显然问题并没有被解决，我也是第二次调用发生同样的错误…… @Yimi81

aflah02 commented 3 months ago

+1 @magicum-sidus The first call works well but the second call shows this error while running - RuntimeError: probability tensor contains eitherinf,nanor element < 0 It does run though and give an output so not sure what's happening

01-ai / Yi