NLPJCL / RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT,Cross Encoder
MIT License
441 stars 38 forks source link

微调bge-m3报错 #22

Closed fortune-ai closed 3 months ago

fortune-ai commented 3 months ago

你好我用colbert去微调bge-m3报错,请问这是原因? RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

NLPJCL commented 3 months ago

您好,我之前没有遇到这个问题。可以给下调用的脚本,我来尝试复现下。

---原始邮件--- 发件人: @.> 发送时间: 2024年6月15日(周六) 晚上10:04 收件人: @.>; 抄送: @.***>; 主题: [NLPJCL/RAG-Retrieval] 微调bge-m3报错 (Issue #22)

你好我用colbert去微调bge-m3报错,请问这是原因? RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

fortune-ai commented 3 months ago

基于你的版本仅仅增加多卡并行去训练,其他都没有变,应该是多卡训练的时候会有这个问题;我增加了这个参数,目前跑起来了 from accelerate import DistributedDataParallelKwargs ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True) accelerator = Accelerator(kwargs_handlers=[ddp_kwargs])

NLPJCL commented 3 months ago

好的👌,本来代码是使用 fsdp 进行多卡数据并行的。

---原始邮件--- 发件人: @.> 发送时间: 2024年6月15日(周六) 晚上11:09 收件人: @.>; 抄送: @.**@.>; 主题: Re: [NLPJCL/RAG-Retrieval] 微调bge-m3报错 (Issue #22)

基于你的版本仅仅增加多卡并行去训练,其他都没有变,应该是多卡训练的时候会有这个问题;我根增加了这个参数,目前跑起来了 from accelerate import DistributedDataParallelKwargs ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True) accelerator = Accelerator(kwargs_handlers=[ddp_kwargs])

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>