Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.82k stars 128 forks source link

模型加载问题 #96

Closed Tzx11 closed 2 months ago

Tzx11 commented 5 months ago

config = MonkeyConfig.from_pretrained( "monkey_model", cache_dir=training_args.cache_dir, trust_remote_code=True, )这段代码因为"monkey_model",会报以下错误 OSError: monkey_model is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

echo840 commented 5 months ago

I apologize for any confusion caused. You can replace "monkey_model" with for the model weights' path you have downloaded or "echo840/Monkey".

Tzx11 commented 5 months ago

感谢您的回复,将”monkey_model“改成”echo840/Monkey”后,出现了以下错误 Traceback (most recent call last): File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 422, in train() File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 405, in train trainer.train() File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1837, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2682, in training_step loss = self.compute_loss(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2707, in compute_loss outputs = model(inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1855, in forward loss = self.module(*inputs, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1004, in forward transformer_outputs = self.transformer( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_monkey.py", line 82, in forward return super().forward(input_ids, File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 840, in forward outputs = torch.utils.checkpoint.checkpoint( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 836, in custom_forward return module(inputs, use_cache, output_attentions) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 516, in forward attn_outputs = self.attn( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 418, in forward query = apply_rotary_pos_emb(query, q_pos_emb) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1343, in apply_rotary_pos_emb output = apply_rotary_embfunc(t, cos, sin).type_as(t) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb return ApplyRotaryEmb.apply( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 48, in forward out = apply_rotary( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/ops/triton/rotary.py", line 213, in apply_rotary rotary_kernel[grid]( File "", line 41, in rotary_kernel File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1587, in compile so_path = make_stub(name, signature, constants) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1476, in make_stub so = _build(name, src_path, tmpdir) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1391, in _build ret = subprocess.check_call(cc_cmd) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpks7a4w5v/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/tongzixuan/anaconda3/envs/monkey/include/python3.9', '-I/tmp/tmpks7a4w5v', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpks7a4w5v/rotary_kernel.cpython-39-x86_64-linux-gnu.so']' returned non-zero exit status 1. /usr/bin/ld: cannot find -lcuda: No such file or directory collect2: error: ld returned 1 exit status Traceback (most recent call last): File "", line 21, in rotary_kernel KeyError: ('2-.-0-.-0-7d1eb0d2fed8ff2032dccb99c2cc311a-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-37a3350d9f1920364a7e68ae67c1a1f0-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float32, torch.float32, torch.float32, torch.float32, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))

TAOSHss commented 5 months ago

[2024-05-27 07:42:53,689] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen to instantiate a model of type monkey. This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Your device does NOT seem to support bf16, you can switch to fp16 or fp32 by by passing fp16/fp32=True in "AutoModelForCausalLM.from_pretrained". Your device does NOT seem to support bf16, you can switch to fp16 or fp32 by by passing fp16/fp32=True in "AutoModelForCausalLM.from_pretrained". /opt/conda/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392020201/work/aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "/root/code/extra/Monkey/textMonkey_testocr.py", line 123, in main() File "/root/code/extra/Monkey/textMonkey_testocr.py", line 82, in main model, tokenizer = _load_model_tokenizer(args) File "/root/code/extra/Monkey/textMonkey_testocr.py", line 38, in _load_model_tokenizer tokenizer = QWenTokenizer.from_pretrained(checkpoint_path, File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(init_inputs, init_kwargs) File "/root/code/extra/Monkey/monkey_model/tokenization_qwen.py", line 114, in init super().init(kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init self._add_tokens( File "/root/code/extra/Monkey/monkey_model/tokenization_qwen.py", line 218, in _add_tokens if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST: AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'

在测试textMonkey模型时候的报错

echo840 commented 5 months ago

Hello, you should either use transformers==4.32.0 or refer to this link for fixing: https://huggingface.co/echo840/Monkey-Chat/discussions/1. https://huggingface.co/echo840/Monkey/discussions/4

[2024-05-27 07:42:53,689] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen to instantiate a model of type monkey. This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Your device does NOT seem to support bf16, you can switch to fp16 or fp32 by by passing fp16/fp32=True in "AutoModelForCausalLM.from_pretrained". Your device does NOT seem to support bf16, you can switch to fp16 or fp32 by by passing fp16/fp32=True in "AutoModelForCausalLM.from_pretrained". /opt/conda/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392020201/work/aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "/root/code/extra/Monkey/textMonkey_testocr.py", line 123, in main() File "/root/code/extra/Monkey/textMonkey_testocr.py", line 82, in main model, tokenizer = _load_model_tokenizer(args) File "/root/code/extra/Monkey/textMonkey_testocr.py", line 38, in _load_model_tokenizer tokenizer = QWenTokenizer.from_pretrained(checkpoint_path, File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(init_inputs, init_kwargs) File "/root/code/extra/Monkey/monkey_model/tokenization_qwen.py", line 114, in init super().init(kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init self._add_tokens( File "/root/code/extra/Monkey/monkey_model/tokenization_qwen.py", line 218, in _add_tokens if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST: AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'

在测试textMonkey模型时候的报错

echo840 commented 5 months ago

感谢您的回复,将”monkey_model“改成”echo840/Monkey”后,出现了以下错误 Traceback (most recent call last): File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 422, in train() File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 405, in train trainer.train() File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1837, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2682, in training_step loss = self.compute_loss(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2707, in compute_loss outputs = model(inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1855, in forward loss = self.module(*inputs, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1004, in forward transformer_outputs = self.transformer( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_monkey.py", line 82, in forward return super().forward(input_ids, File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 840, in forward outputs = torch.utils.checkpoint.checkpoint( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 836, in custom_forward return module(inputs, use_cache, output_attentions) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 516, in forward attn_outputs = self.attn( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 418, in forward query = apply_rotary_pos_emb(query, q_pos_emb) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1343, in apply_rotary_pos_emb output = apply_rotary_embfunc(t, cos, sin).type_as(t) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb return ApplyRotaryEmb.apply( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 48, in forward out = apply_rotary( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/ops/triton/rotary.py", line 213, in apply_rotary rotary_kernel[grid]( File "", line 41, in rotary_kernel File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1587, in compile so_path = make_stub(name, signature, constants) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1476, in make_stub so = _build(name, src_path, tmpdir) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1391, in _build ret = subprocess.check_call(cc_cmd) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpks7a4w5v/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/tongzixuan/anaconda3/envs/monkey/include/python3.9', '-I/tmp/tmpks7a4w5v', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpks7a4w5v/rotary_kernel.cpython-39-x86_64-linux-gnu.so']' returned non-zero exit status 1. /usr/bin/ld: cannot find -lcuda: No such file or directory collect2: error: ld returned 1 exit status Traceback (most recent call last): File "", line 21, in rotary_kernel KeyError: ('2-.-0-.-0-7d1eb0d2fed8ff2032dccb99c2cc311a-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-37a3350d9f1920364a7e68ae67c1a1f0-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float32, torch.float32, torch.float32, torch.float32, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))

您能告诉我,您是在训练还是测试时报错的呢?

Tzx11 commented 5 months ago

感谢您的回复,将”monkey_model“改成”echo840/Monkey”后,出现了以下错误 Traceback (most recent call last): File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 422, in train() File "/home/tongzixuan/code/Monkey-main/finetune/../finetune_multitask.py", line 405, in train trainer.train() File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 1837, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2682, in training_step loss = self.compute_loss(model, inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/transformers/trainer.py", line 2707, in compute_loss outputs = model(inputs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1855, in forward loss = self.module(*inputs, kwargs) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1004, in forward transformer_outputs = self.transformer( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_monkey.py", line 82, in forward return super().forward(input_ids, File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 840, in forward outputs = torch.utils.checkpoint.checkpoint( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 836, in custom_forward return module(inputs, use_cache, output_attentions) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 516, in forward attn_outputs = self.attn( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 418, in forward query = apply_rotary_pos_emb(query, q_pos_emb) File "/home/tongzixuan/code/Monkey-main/monkey_model/modeling_qwen.py", line 1343, in apply_rotary_pos_emb output = apply_rotary_embfunc(t, cos, sin).type_as(t) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb return ApplyRotaryEmb.apply( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/layers/rotary.py", line 48, in forward out = apply_rotary( File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/flash_attn/ops/triton/rotary.py", line 213, in apply_rotary rotary_kernel[grid]( File "", line 41, in rotary_kernel File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1587, in compile so_path = make_stub(name, signature, constants) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1476, in make_stub so = _build(name, src_path, tmpdir) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/site-packages/triton/compiler.py", line 1391, in _build ret = subprocess.check_call(cc_cmd) File "/home/tongzixuan/anaconda3/envs/monkey/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpks7a4w5v/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/tongzixuan/anaconda3/envs/monkey/include/python3.9', '-I/tmp/tmpks7a4w5v', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpks7a4w5v/rotary_kernel.cpython-39-x86_64-linux-gnu.so']' returned non-zero exit status 1. /usr/bin/ld: cannot find -lcuda: No such file or directory collect2: error: ld returned 1 exit status Traceback (most recent call last): File "", line 21, in rotary_kernel KeyError: ('2-.-0-.-0-7d1eb0d2fed8ff2032dccb99c2cc311a-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-37a3350d9f1920364a7e68ae67c1a1f0-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float32, torch.float32, torch.float32, torch.float32, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))

您能告诉我,您是在训练还是测试时报错的呢?

在训练的时候报错

echo840 commented 5 months ago

您是在相应的项目文件夹中运行训练sh吗?

tokenizer = QWenTokenizer.from_pretrained(
        "monkey_model",
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        use_fast=False,
        trust_remote_code=True,
    )

这里的"monkey_model"是相对路径,指向您项目中的link文件夹。此外您需要检查一下您的环境问题。

Tzx11 commented 5 months ago

您是在相应的项目文件夹中运行训练sh吗?

tokenizer = QWenTokenizer.from_pretrained(
        "monkey_model",
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        use_fast=False,
        trust_remote_code=True,
    )

这里的"monkey_model"是相对路径,指向您项目中的link文件夹。此外您需要检查一下您的环境问题。

我将这里的"monkey_model"改成了本地路径,并且按照readme文件一步一步配置的环境,但是之后运行训练脚本就会报上面的错误