Open AgentLLM opened 6 months ago
Can you try setting ddp_find_unused_parameters: true
?
Can you try setting
ddp_find_unused_parameters: true
?
Add ddp_find_unused_parameters: true
in the lisa.yaml
, and have the same bug.
are you using FSDP or deepspeed?
It seems this might be some DDP specific issue. I've tried a few things like setting a deterministic seed for the random layer picker and adding ddp_find_unused_parameters: true
, to no avail.
are you using FSDP or deepspeed?
I'm not sure which one I am using. This is my first time using your LLM framework, and I've only added 'ddp_find_unused_parameters: true' to the 'lisa.yaml' file without making any other changes.
btw, here's a discussion on deepspeed issues w/ LISA: https://github.com/OptimalScale/LMFlow/issues/726 and a potential workaround: https://github.com/OptimalScale/LMFlow/issues/726#issuecomment-2041335788
Please check that this issue hasn't been reported before.
Expected Behavior
The LISA should run on multi-GPU.
Current behaviour
The LISA can only run on single-GPU. Change to the multi-GPU will lead to below bug.
Steps to reproduce
Below is the multi-GPU config.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.9
axolotl branch-commit
main
Acknowledgements