[BUG/Help] <title>设置多卡参数CUDA_VISIBLE_DEVICES=0,1,2,3 并运行evaluate.sh，报错RuntimeError: handle_0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1695392067780/work/c10/cuda/driver_api.cpp":15, please report a bug to PyTorch.

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

PRE_SEQ_LEN=128 CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2 STEP=3000

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 main.py \ --do_predict \ --validation_file /home/public/dataset/dev.json \ --test_file /home/public/dataset/dev.json \ --overwrite_cache \ --prompt_column content \ --response_column summary \ --model_name_or_path /home/public/ChatGLM-6B/THUDM/chatglm-6b \ --ptuning_checkpoint ./output/$CHECKPOINT/checkpoint-$STEP \ --output_dir ./output/$CHECKPOINT \ --overwrite_output_dir \ --max_source_length 2096 \ --max_target_length 64 \ --per_device_eval_batch_size 5 \ --predict_with_generate \ --pre_seq_len $PRE_SEQ_LEN \ --quantization_bit 4\

Expected Behavior

No response

Steps To Reproduce

运行bash evaluate.sh完成tokenizer后，就会报错RuntimeError: handle_0 INTERNAL ASSERT FAILED 。但如果单卡执行，就不会报错

Environment

- OS:ubuntu22.08
- Python:3.11
- Transformers:1.47
- PyTorch:2.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :true

Anything else?

No response

THUDM / ChatGLM-6B