InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
https://xtuner.readthedocs.io/zh-cn/latest/
Apache License 2.0
3.24k stars 262 forks source link

ImportError: Failed to import AutoModelForCausalLM from xtuner.model.transformers_models in None #740

Open ldh127 opened 1 month ago

ldh127 commented 1 month ago

报错,这个错误,第一次没有跑通: The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Traceback (most recent call last): File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/tools/train.py", line 353, in main() File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/tools/train.py", line 342, in main runner = Runner.from_cfg(cfg) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg runner = cls( File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in init self.model = self.build_model(model) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model model = MODELS.build(model) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, kwargs, registry=self) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/model/sft.py", line 97, in init llm = self._build_from_cfg_or_module(llm) File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/model/sft.py", line 283, in _build_from_cfg_or_module traverse_dict(cfg_or_mod) File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/model/utils.py", line 35, in traverse_dict set_obj_dtype(d) File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/model/utils.py", line 15, in set_obj_dtype for key, value in d.items(): File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/config/config.py", line 239, in items items.append((key, self.build_lazy(value))) File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/config/config.py", line 217, in build_lazy value = value.build() File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/config/lazy.py", line 219, in build obj = self.source.build() File "/home/dehao.li/.local/lib/python3.10/site-packages/mmengine/config/lazy.py", line 77, in build raise ImportError( ImportError: Failed to import AutoModelForCausalLM from xtuner.model.transformers_models in None

HIT-cwh commented 1 month ago

请问你用的是什么版本的 XTuner 呢?

ldh127 commented 1 month ago

请问你用的是什么版本的 XTuner 呢?

用的是你fork 的这个工程 https://github.com/HIT-cwh/xtuner 的 deepseek v2 分支,源码安装的,这个分支能跑成功吗现在?可以方便加个微信联系吗

HIT-cwh commented 1 month ago

这个分支刚更新了一版,目前正在review中,您可以拉一下最新的commit试用一下。

我这边全量微调流程已经跑通了,全量微调16k上下文的Deepseek V2 236B模型,需要128 A100-80G。

XTuner 对 Deepseek V2 moe结构进行了优化,相比原生 hf 模型,全量微调速度提升5-6倍。

这里有微调 2k 上下文的 config 供您参考。

有问题欢迎讨论!

ldh127 commented 1 month ago

这个分支刚更新了一版,目前正在review中,您可以拉一下最新的commit试用一下。

我这边全量微调流程已经跑通了,全量微调16k上下文的Deepseek V2 236B模型,需要128 A100-80G。

XTuner 对 Deepseek V2 moe结构进行了优化,相比原生 hf 模型,全量微调速度提升5-6倍。

这里有微调 2k 上下文的 config 供您参考。

有问题欢迎讨论!

我使用了您最新的代码来跑 models--deepseek-ai--DeepSeek-V2-Lite, 这次任务启动没有报错了,但是运行的过程中报这个, 06/03 20:56:19 - mmengine - INFO - before_train in EvaluateChatHook. rank5: Traceback (most recent call last): rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/tools/train.py", line 353, in

rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/tools/train.py", line 349, in main

rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1777, in train rank5: model = self.train_loop.run() # type: ignore rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/mmengine/runner/loops.py", line 270, in run

rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1839, in call_hook rank5: getattr(hook, fn_name)(self, kwargs) rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 230, in before_train rank5: self._generate_samples(runner, max_new_tokens=50) rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 219, in _generate_samples rank5: self._eval_language(runner, model, device, max_new_tokens, rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 177, in _eval_language rank5: generation_output = model.generate( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank5: return func(*args, *kwargs) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate rank5: result = self._sample( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample rank5: outputs = self( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank5: return self._call_impl(args, kwargs) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank5: return forward_call(*args, kwargs) rank5: File "/home/dehao.li/.cache/huggingface/modules/transformers_modules/901a97adff241a01be00e3f7a224c3138e5d2a75/modeling_deepseek.py", line 1669, in forward rank5: outputs = self.model( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank5: return self._call_impl(*args, *kwargs) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank5: return forward_call(args, kwargs) rank5: File "/home/dehao.li/.cache/huggingface/modules/transformers_modules/901a97adff241a01be00e3f7a224c3138e5d2a75/modeling_deepseek.py", line 1538, in forward rank5: layer_outputs = decoder_layer( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank5: return self._call_impl(*args, kwargs) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank5: return forward_call(*args, *kwargs) rank5: File "/home/dehao.li/.cache/huggingface/modules/transformers_modules/901a97adff241a01be00e3f7a224c3138e5d2a75/modeling_deepseek.py", line 1252, in forward rank5: hidden_states, self_attn_weights, present_key_value = self.self_attn( rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank5: return self._call_impl(args, kwargs) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank5: return forward_call(*args, **kwargs) rank5: File "/maindata/data/shared/Security-SFT/dehao.li/env_root/deepseekv2_xtuner/xtuner/xtuner/model/modules/dispatch/deepseek_v2.py", line 42, in deepseek_attn_forward rank5: q = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states))) rank5: File "/maindata/data/shared/Security-SFT/common_tools/mambaforge/envs/deepseekv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr rank5: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") rank5: AttributeError: 'DeepseekV2FlashAttention2' object has no attribute 'q_b_proj'. Did you mean: 'kv_b_proj'?

HIT-cwh commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

ldh127 commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

不知道 Deepseek-lite 会不会是和 Deepseek 的模型结构以及技术一模一样呢? 只是网络层数等超参不一样哈

ldh127 commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

目前支持 deepseek v2 lora 吗?

ldh127 commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

我发觉我这个问题报错的原因是因为 DeepseekV2FlashAttention2 继承了 DeepseekV2Attention,DeepseekV2Attention 里有的 self.q_b_proj属性,在 调用 DeepseekV2FlashAttention2的时候没有找到,很奇怪啊,属性没有继承下来吗

ldh127 commented 1 month ago

这个分支刚更新了一版,目前正在review中,您可以拉一下最新的commit试用一下。

我这边全量微调流程已经跑通了,全量微调16k上下文的Deepseek V2 236B模型,需要128 A100-80G。

XTuner 对 Deepseek V2 moe结构进行了优化,相比原生 hf 模型,全量微调速度提升5-6倍。

这里有微调 2k 上下文的 config 供您参考。

有问题欢迎讨论!

xtuner 框架封装的太难用了,啥时候支持下 firefly 或则 llama -factory 提个分支啊 ,微调 deepseek v2 的

HIT-cwh commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

我发觉我这个问题报错的原因是因为 DeepseekV2FlashAttention2 继承了 DeepseekV2Attention,DeepseekV2Attention 里有的 self.q_b_proj属性,在 调用 DeepseekV2FlashAttention2的时候没有找到,很奇怪啊,属性没有继承下来吗

我们目前只支持了 Deepseek V2,Deepseek V2 Lite 很快就会支持!

你的这个报错是因为,Deepseek V2 Lite 与 Deepseek V2 的Attention结构是不同的。XTuner 会 dispatch Attention 的 forward 方法来支持序列并行和变长注意力的功能。目前 XTuner 只针对Deepseek V2 的Attention做了dispatch,所以在运行 Deepseek V2 Lite 的时候会报错没有self.q_b_proj这个属性。

image

HoboRiceone commented 1 month ago

@HIT-cwh 您好,很有幸能看到这个issue,我这边最近也在使用xtuner训练Deepseek V2 Lite,也遇到了类似的适配问题,您这边完成了对Deepseek V2 Lite的支持后能否在这个issue下回复一下,感谢您的工作!

ldh127 commented 1 month ago

我们目前只支持了Deepseek,Deepseek-lite我们会尽快支持,可以先尝试跑下Deepseek 236B

我发觉我这个问题报错的原因是因为 DeepseekV2FlashAttention2 继承了 DeepseekV2Attention,DeepseekV2Attention 里有的 self.q_b_proj属性,在 调用 DeepseekV2FlashAttention2的时候没有找到,很奇怪啊,属性没有继承下来吗

我们目前只支持了 Deepseek V2,Deepseek V2 Lite 很快就会支持!

你的这个报错是因为,Deepseek V2 Lite 与 Deepseek V2 的Attention结构是不同的。XTuner 会 dispatch Attention 的 forward 方法来支持序列并行和变长注意力的功能。目前 XTuner 只针对Deepseek V2 的Attention做了dispatch,所以在运行 Deepseek V2 Lite 的时候会报错没有self.q_b_proj这个属性。

image

感谢回复,期待你们更近一步的工作,要是能在 llama-factory和firefly 上也提个pr就更好了