Open xxuyyuan opened 1 year ago
感觉是显卡没配置好,你试一下这段代码你能跑通吗:
from bitsandbytes.nn import LinearNF4
model = LinearNF4(10, 20).cuda()
import torch
x = torch.randn(2, 10).cuda()
out = model(x)
感觉是显卡没配置好,你试一下这段代码你能跑通吗:
from bitsandbytes.nn import LinearNF4 model = LinearNF4(10, 20).cuda() import torch x = torch.randn(2, 10).cuda() out = model(x)
在脚本里面正常运行; (base) root@6633711ec9b0:/home/data/VisualGLM-6B# python3 Python 3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from bitsandbytes.nn import LinearNF4
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/conda/lib/libcudart.so.11.0'), PosixPath('/opt/conda/lib/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /opt/conda/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
model = LinearNF4(10, 20).cuda() import torch x = torch.randn(2, 10).cuda() out = model(x)
我这边是可以跑起来的,你看一下你的代码和VisualGLM-6B的main分支一致吗?是不是没有更新到最新版或者你本地改了什么?以及bitsandbytes是不是0.39.0版本。
我这边是可以跑起来的,你看一下你的代码和VisualGLM-6B的main分支一致吗?是不是没有更新到最新版或者你本地改了什么?以及bitsandbytes是不是0.39.0版本。
bitsandbytes版本是0.39.0,然后重新更新了代码,跑了一下;
第一次运行出现:
AttributeError: 'FakeTokenizer' object has no attribute 'encode'
详情:
[2023-06-12 08:15:11,713] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
/opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
replacing layer 0 attention with lora
replacing layer 14 attention with lora
replacing chatglm linear layer with 4bit
[2023-06-12 08:15:58,973] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376
[2023-06-12 08:15:59,738] [INFO] [RANK 0] global rank 0 is loading checkpoint /root/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
[2023-06-12 08:16:04,555] [INFO] [RANK 0] > successfully loaded /root/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
[2023-06-12 08:16:07,585] [INFO] [RANK 0] Try to load tokenizer from Huggingface transformers...
Explicitly passing a revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[2023-06-12 08:32:23,056] [INFO] [RANK 0] Cannot find THUDM/chatglm-6b from Huggingface or sat. Creating a fake tokenizer...
Traceback (most recent call last):
File "/home/data/VisualGLM-6B/finetune_visualglm.py", line 195, in
再次运行: RuntimeError: Error building extension 'fused_adam' 详情: 6633711ec9b0:11642:11786 [0] NCCL INFO Connected all rings 6633711ec9b0:11642:11786 [0] NCCL INFO Connected all trees 6633711ec9b0:11642:11786 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer 6633711ec9b0:11642:11786 [0] NCCL INFO comm 0xb1efe50 rank 0 nranks 1 cudaDev 0 busId 54000 - Init COMPLETE [2023-06-12 08:35:47,241] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu116/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /opt/conda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++14 -c /opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o FAILED: multi_tensor_adam.cuda.o /opt/conda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++14 -c /opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o In file included from /opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu:13:0: /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
^~~~~~~~~~~~~~
compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/data/VisualGLM-6B/finetune_visualglm.py", line 195, in
cuda环境的问题嘛?来来回回陷入死循环;
tokenizer的问题可以参考这里:https://github.com/THUDM/VisualGLM-6B/issues/111#issuecomment-1579019781
tokenizer的问题可以参考这里:#111 (comment)
tokenzier重新运行是正常;
主要是后面的问题: RuntimeError: Error building extension 'fused_adam',详情见上面;
这个应该是deepspeed配置的问题,有一个类似的issue:https://github.com/THUDM/VisualGLM-6B/issues/43
查了一下可能的解决方案:
apt-get update; apt-get install ninja-build
我这边是可以跑起来的,你看一下你的代码和VisualGLM-6B的main分支一致吗?是不是没有更新到最新版或者你本地改了什么?以及bitsandbytes是不是0.39.0版本。
修改了 finetune_visualglm.py #176行 args.device = 'cpu' 修改成 args.device = 'cuda'
错误就从RuntimeError: Error building extension 'fused_adam',变成了 维度问题RuntimeError: The size of tensor a (25165824) must match the size of tensor b (12288) at non-singleton dimension 0;
这里的args.device 指的是什么,为啥会这样呢?
/opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
include
问题解决了,可以训练啦!!主要是cusolverDn.h: No such file or directory 找不到导致; 添加环境变量,export PATH=/usr/local/cuda/bin:$PATH
请问这个问题解决了吗,就是维度不一致的问题
我这边是可以跑起来的,你看一下你的代码和VisualGLM-6B的main分支一致吗?是不是没有更新到最新版或者你本地改了什么?以及bitsandbytes是不是0.39.0版本。
修改了 finetune_visualglm.py #176行 args.device = 'cpu' 修改成 args.device = 'cuda'
错误就从RuntimeError: Error building extension 'fused_adam',变成了 维度问题RuntimeError: The size of tensor a (25165824) must match the size of tensor b (12288) at non-singleton dimension 0;
这里的args.device 指的是什么,为啥会这样呢?
请问这个问题解决了吗,维度不一致的问题
我这边是可以跑起来的,你看一下你的代码和VisualGLM-6B的main分支一致吗?是不是没有更新到最新版或者你本地改了什么?以及bitsandbytes是不是0.39.0版本。
修改了 finetune_visualglm.py #176行 args.device = 'cpu' 修改成 args.device = 'cuda' 错误就从RuntimeError: Error building extension 'fused_adam',变成了 维度问题RuntimeError: The size of tensor a (25165824) must match the size of tensor b (12288) at non-singleton dimension 0; 这里的args.device 指的是什么,为啥会这样呢?
请问这个问题解决了吗,维度不一致的问题
这个没有解决,还是改为原来的代码#176行 args.device = 'cpu'这里设置成cpu,然后就是cuda环境问题,RuntimeError: Error building extension 'fused_adam';通过上面配置环境变量解决了;
因为bitsandbytes
实现模型量化的时候是通过重载.cuda()
函数实现的,也就是说模型在放到显卡的时候会发生量化(改变tensor维度)。在微调的时候,加载的预训练权重是fp16的,所以需要设置args.device='cpu'
,把权重加载进来再调用.cuda()
。因为这个是bitsandbytes
的实现,我们也没办法控制,只能适配。
所以维度不一致是显卡配置的问题,.cuda()
调用失败了。
这个应该是deepspeed配置的问题,有一个类似的issue:#43
查了一下可能的解决方案:
apt-get update; apt-get install ninja-build
- 把cuda版本从10.1升级到10.2(https://github.com/microsoft/DeepSpeed/issues/694)
我的CUDA版本是12.0 也是这个问题
/opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
include
问题解决了,可以训练啦!!主要是cusolverDn.h: No such file or directory 找不到导致; 添加环境变量,export PATH=/usr/local/cuda/bin:$PATH
在哪添加呢
pip uninstall deepspeed DS_BUILD_FUSED_ADAM=1 pip install deepspeed 以上不行的话再试试 git clone https://github.com/microsoft/DeepSpeed.git cd DeepSpeed DS_BUILD_FUSED_ADAM=1 pip3 install . 还是不行的话,提出你的错误 pip uninstall deepspeed DS_BUILD_FUSED_ADAM=1 pip install deepspeed 进行了上述操作依然出现这个报错 File "/home/nbicc/data/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused_adam'
/opt/conda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
include
问题解决了,可以训练啦!!主要是cusolverDn.h: No such file or directory 找不到导致; 添加环境变量,export PATH=/usr/local/cuda/bin:$PATH
我输入 vi ~/.bashrc命令,在底下添加了环境变量export PATH=/usr/local/cuda/bin:$PATH依然出现这个问题nsion.py", line 2112, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused_adam'
tokenizer的问题可以参考这里:#111 (comment)
tokenzier重新运行是正常;
主要是后面的问题: RuntimeError: Error building extension 'fused_adam',详情见上面;
问题已全部解决,微调成功
tokenizer的问题可以参考这里:#111 (comment)
tokenzier重新运行是正常; 主要是后面的问题: RuntimeError: Error building extension 'fused_adam',详情见上面;
问题已全部解决,微调成功
推理微调后的模型权重文件时出现: File "/home/nbicc/data/anaconda3/envs/lm/lib/python3.8/site-packages/transformers/utils/hub.py", line 469, in cached_file raise EnvironmentError( OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/chatglm-6b is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. 有没有人遇到这个问题
tokenizer的问题可以参考这里:#111 (comment)
tokenzier重新运行是正常; 主要是后面的问题: RuntimeError: Error building extension 'fused_adam',详情见上面;
问题已全部解决,微调成功
推理微调后的模型权重文件时出现: File "/home/nbicc/data/anaconda3/envs/lm/lib/python3.8/site-packages/transformers/utils/hub.py", line 469, in cached_file raise EnvironmentError( OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/chatglm-6b is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. 有没有人遇到这个问题
已解决,把里面报错信息里提到的模型文件下载到本地,再去运行的文件里指定它在本地的路径就可以了
微调llama3 我也遇到了相同的问题,不过解决了
1、使用以下版本,主要是peft使用0.4.0 accelerate==0.33.0 transformers==4.44.0 peft==0.4.0 bitsandbytes==0.43.3 loguru==0.7.0 jsonschema==4.23.0 tensorboard==2.14.0
2、lora config "lora_rank": 64, "lora_alpha": 16, "lora_dropout": 0.05,
(base) root@6633711ec9b0:/home/data/VisualGLM-6B# bash finetune/finetune_visualglm_qlora.sh NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 deepspeed --master_port 16666 --include localhost:0 --hostfile hostfile_single finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --layer_range 0 14 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 8 --gradient-accumulation-steps 4 --skip-init --fp16 --use_qlora Setting ds_accelerator to cuda (auto detect) [2023-06-12 06:33:33,961] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-06-12 06:33:34,032] [INFO] [runner.py:555:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=16666 --enable_each_rank_log=None finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --layer_range 0 14 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 8 --gradient-accumulation-steps 4 --skip-init --fp16 --use_qlora Setting ds_accelerator to cuda (auto detect) [2023-06-12 06:33:35,958] [INFO] [launch.py:138:main] 0 NCCL_DEBUG=info [2023-06-12 06:33:35,958] [INFO] [launch.py:138:main] 0 NCCL_NET_GDR_LEVEL=2 [2023-06-12 06:33:35,958] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=0 [2023-06-12 06:33:35,958] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.12.10-1 [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2 [2023-06-12 06:33:35,959] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 [2023-06-12 06:33:35,959] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]} [2023-06-12 06:33:35,959] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-06-12 06:33:35,959] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-06-12 06:33:35,959] [INFO] [launch.py:163:main] dist_world_size=1 [2023-06-12 06:33:35,959] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0 Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 216, in from_pretrained
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/opt/conda/lib/python3.10/site-packages/sat/training/model_io.py", line 208, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1657, in load_state_dict
load(self, state_dict)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1645, in load
load(child, child_state_dict, child_prefix)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1645, in load
load(child, child_state_dict, child_prefix)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1645, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 2 more times]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1639, in load
module._load_from_state_dict(
File "/home/data/VisualGLM-6B/lora_mixin.py", line 109, in _load_from_state_dict
self.original._load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
File "/home/data/VisualGLM-6B/lora_mixin.py", line 47, in _load_from_statedict
self.weight.data.copy(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (25165824) must match the size of tensor b (12288) at non-singleton dimension 0
[2023-06-12 06:34:36,019] [INFO] [launch.py:314:sigkill_handler] Killing subprocess 8346
[2023-06-12 06:34:36,019] [ERROR] [launch.py:320:sigkill_handler] ['/opt/conda/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '8', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = 1
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /opt/conda/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 116 CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so... [2023-06-12 06:33:39,415] [INFO] using world size: 1 and model-parallel size: 1 [2023-06-12 06:33:39,415] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128) 16666 [2023-06-12 06:33:39,417] [INFO] [RANK 0] > initializing model parallel with size 1 [2023-06-12 06:33:39,418] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-06-12 06:33:39,418] [INFO] [comm.py:594:init_distributed] cdb=None [2023-06-12 06:33:39,418] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead [2023-06-12 06:33:39,418] [INFO] [checkpointing.py:764:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False} [2023-06-12 06:33:39,419] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 [2023-06-12 06:33:39,419] [INFO] [RANK 0] building FineTuneVisualGLMModel model ... /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") replacing layer 0 attention with lora replacing layer 14 attention with lora replacing chatglm linear layer with 4bit [2023-06-12 06:34:26,500] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7811237376 [2023-06-12 06:34:30,185] [INFO] [RANK 0] global rank 0 is loading checkpoint /root/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt Traceback (most recent call last): File "/home/data/VisualGLM-6B/finetune_visualglm.py", line 180, in