InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
https://xtuner.readthedocs.io/zh-cn/latest/
Apache License 2.0
3.92k stars 305 forks source link

InternLM2-chat-7b,微调合并权重后,进行chat时报错:RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #337

Closed Egber1t closed 9 months ago

Egber1t commented 9 months ago

image image xtuner 0.1.12 torch 2.1.2 transformers 4.36.2 transformers-stream-generator 0.0.4 huggingface-hub 0.20.2 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu12 12.1.105

LZHgrla commented 9 months ago

尝试一下与未微调的LLM对话,看是否正常。 同时,internlm2 chat的话请使用 internlm2_chat 模板,而非 internlm_chat

Egber1t commented 9 months ago

@LZHgrla 我拿tutorial里面的demo模版也不行(未微调),demo代码如下:from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "/home/mice/TNB/internlm2-chat-7b"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto') model = model.eval()

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).

messages = [(system_prompt, '')]

print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")

while True: input_text = input("User >>> ") input_text = input_text.replace(' ', '') if input_text == "exit": break response, history = model.chat(tokenizer, input_text, history=messages) messages.append((input_text, response)) print(f"robot >>> {response}") image在这里成功后,我查看显存确实有占用 image 然后输入 hello后报错:RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 我根据报错设置了:export TORCH_USE_CUDA_DSA=1 运行后,问题还是跟上述报错一样,求解决

LZHgrla commented 9 months ago
  1. 尝试在chat时使用4bit量化,看是否仍会有问题 xtuner chat xxxx --bits 4,以验证显存是否不足
  2. 与其他LLM对话,看是否有问题,以验证模型code是否有问题
  3. 重新安装 conda 环境,以排除环境坏掉的可能
Egber1t commented 9 months ago

@LZHgrla 1.使用4bit量化报错还是RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 2.与其他llm(chatglm3)也是同上报错 3.重新安装完了,显卡:4090,conda 4.10.3,CUDA Version: 12.3,torch version:2.1.2,其余环境依赖如下:Package Version Editable project location


accelerate 0.26.1 addict 2.4.0 aiohttp 3.9.1 aiosignal 1.3.1 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 annotated-types 0.6.0 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.42.0 Brotli 1.0.9 certifi 2023.11.17 cffi 1.16.0 charset-normalizer 2.0.4 contourpy 1.2.0 crcmod 1.7 cryptography 41.0.7 cycler 0.12.1 datasets 2.16.1 deepspeed 0.13.0 dill 0.3.7 distro 1.9.0 einops 0.7.0 et-xmlfile 1.1.0 filelock 3.13.1 fonttools 4.47.2 frozenlist 1.4.1 fsspec 2023.10.0 func-timeout 4.3.5 gast 0.5.4 gmpy2 2.1.2 hjson 3.1.0 huggingface-hub 0.20.3 idna 3.4 importlib-metadata 7.0.1 Jinja2 3.1.2 jmespath 0.10.0 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lagent 0.1.2 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.8.2 mdurl 0.1.2 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mmengine 0.10.2 modelscope 1.11.0 mpi4py-mpich 3.1.2 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 networkx 3.1 ninja 1.11.1.1 numpy 1.26.3 opencv-python 4.9.0.80 openpyxl 3.1.2 oss2 2.18.4 packaging 23.2 pandas 2.2.0 peft 0.7.1 Pillow 10.0.1 pip 23.3.1 platformdirs 4.1.0 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.0 pyarrow-hotfix 0.6 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.5.3 pydantic_core 2.14.6 Pygments 2.17.2 pynvml 11.5.0 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 referencing 0.32.1 regex 2023.12.25 requests 2.31.0 rich 13.7.0 rpds-py 0.17.1 safetensors 0.4.1 scipy 1.12.0 sentencepiece 0.1.99 setuptools 68.2.2 simplejson 3.19.2 six 1.16.0 sortedcontainers 2.4.0 sympy 1.12 termcolor 2.4.0 tiktoken 0.5.2 tokenizers 0.15.1 tomli 2.0.1 torch 2.1.2 torchaudio 2.1.2 torchvision 0.16.2 tqdm 4.66.1 transformers 4.37.0 transformers-stream-generator 0.0.4 triton 2.1.0 typing_extensions 4.9.0 tzdata 2023.4 urllib3 1.26.18 wheel 0.41.2 xtuner 0.1.14.dev0 /home/mice/TNB/xtuner xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.17.0 望大佬解答,不胜感激谢谢!

LZHgrla commented 9 months ago

尝试降级一下几个关键的库试试

pip install transformers==4.36.2 bitsandbytes==0.41.2.post2 deepspeed==0.12.3
Egber1t commented 9 months ago

@LZHgrla 降完后还是报错:(xtuner) mice@dell-PowerEdge-T640:~/TNB$ xtuner chat ./merged --bits 4 --prompt-template internlm2_chat [2024-01-23 13:19:10,380] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. [2024-01-23 13:19:14,340] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00, 1.49s/it] Load LLM from ./merged

double enter to end input (EXIT: exit chat, RESET: reset history) >>> helloi^H

/opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [34,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [35,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [36,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [37,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [38,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [39,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [40,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [41,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [42,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [43,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [44,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [45,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [46,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [47,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [48,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [49,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [50,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [51,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [52,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [53,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [54,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [55,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [56,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [58,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [59,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [60,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [61,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [62,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [63,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [97,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [98,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [99,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [100,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [101,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [102,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [103,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [104,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [105,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [106,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [107,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [108,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [109,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [110,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [111,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [112,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [113,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [114,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [115,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [116,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [117,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [118,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [119,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [120,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [121,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [122,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [123,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [124,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [125,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [126,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [127,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [66,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [67,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [68,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [69,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [70,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [71,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [72,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [73,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [74,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [75,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [76,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [77,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [78,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [79,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [80,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [81,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [82,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [83,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [84,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [85,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [86,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [87,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [88,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [89,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [90,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [91,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [92,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [93,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [94,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [95,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [0,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [1,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [2,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [3,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [4,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [5,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [6,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [7,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [8,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [9,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [10,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [11,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [12,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [13,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [14,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [15,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [16,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [17,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [18,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [19,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [20,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [21,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [22,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [23,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [24,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [25,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [26,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [27,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [28,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [29,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [30,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1702400410390/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [31,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "/home/mice/TNB/xtuner/xtuner/tools/chat.py", line 488, in main() File "/home/mice/TNB/xtuner/xtuner/tools/chat.py", line 434, in main generate_output = llm.generate( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate return self.sample( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/transformers/generation/utils.py", line 2861, in sample outputs = self( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, kwargs) File "/home/mice/.cache/huggingface/modules/transformers_modules/merged/modeling_internlm2.py", line 1049, in forward outputs = self.model( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/mice/.cache/huggingface/modules/transformers_modules/merged/modeling_internlm2.py", line 934, in forward layer_outputs = decoder_layer( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, kwargs) File "/home/mice/.cache/huggingface/modules/transformers_modules/merged/modeling_internlm2.py", line 641, in forward hidden_states, self_attn_weights, present_key_value = self.attention( File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/mice/anaconda3/envs/xtuner/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, kwargs) File "/home/mice/.cache/huggingface/modules/transformers_modules/merged/modeling_internlm2.py", line 366, in forward query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids) File "/home/mice/.cache/huggingface/modules/transformers_modules/merged/modeling_internlm2.py", line 228, in apply_rotary_pos_emb cos = cos[position_ids].unsqueeze(unsqueeze_dim) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Egber1t commented 9 months ago

我大概知道哪里出错了,前几天cuda自动更新到12.3,目前pytorch最高支持12.1,我觉得可能是这个原因

weiliangxiong commented 9 months ago

我大概知道哪里出错了,前几天cuda自动更新到12.3,目前pytorch最高支持12.1,我觉得可能是这个原因 问下问题解决没啊,我们也一直遇到这个报错还没解决。

Egber1t commented 9 months ago

我大概知道哪里出错了,前几天cuda自动更新到12.3,目前pytorch最高支持12.1,我觉得可能是这个原因 问下问题解决没啊,我们也一直遇到这个报错还没解决。

我在autodl上面租了台服务器,cuda11.8,就没有问题,微调也成功了

zky001 commented 7 months ago

我也遇到这个问题了

zky001 commented 7 months ago

但是我是微调过程中出现的