axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.91k stars 869 forks source link

Llama 3.1 liger example is not working #1892

Closed Stealthwriter closed 2 days ago

Stealthwriter commented 2 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

it should train

Current behaviour

examples/llama-3/fft-8b-liger-fsdp.yaml

this example is not working optimizer: paged_adamw_8bit is not compatible with fsdp I tried changing it I still get this error: Value error, FSDP Offload not compatible with adamw_bnb_8bit

I commented out the fsdp settings and used deep speed it worked

Steps to reproduce

run example as is

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

latest

Acknowledgements

ganler commented 1 month ago

Seems not working for Mistral too.

Got the same error in https://github.com/linkedin/Liger-Kernel/issues/100 even if upgraded Liger

NanoCode012 commented 2 weeks ago

Hello! I could not reproduce this issue on current main. I ran this on 2xL40 which works (with edits below for dataset due to some new changes).

datasets:
  - path: mlabonne/FineTome-100k
    type: chat_template
    split: train[:20%]
+    field_messages: conversations
+    message_field_role: from
+    message_field_content: value

optimizer: adamw_torch

Could either of you clarify if you still see this issue?

winglian commented 2 days ago

the current version of the example should be correct now. 8-bit optimizers do not work with FSDP1, so you should use regular 32bit optimizers with FSDP

https://github.com/bitsandbytes-foundation/bitsandbytes/issues/89