epfml / llm-baselines

nanoGPT-like codebase for LLM training
MIT License
75 stars 22 forks source link

add methods #18

Open Andron00e opened 1 month ago

Andron00e commented 1 month ago
Andron00e commented 1 month ago

add schedules also | link

upd: has been added via this commit

Andron00e commented 1 month ago

some problems with installation of the lates version of schedulefree, so I added this manually see: https://github.com/epfml/llm-baselines/blob/soap/src/optim/schedulefree.py

martinjaggi commented 1 month ago

is there a pull request for this? would be nice to collaborate

Andron00e commented 1 month ago

is there a pull request for this? would be nice to collaborate

hi, we are deploying it to the soap branch together with @mpagli

Andron00e commented 1 month ago

a useful settings:

Andron00e commented 1 month ago

Adam-mini Note

I use model.named_parameters() for Adam-mini instead of group_specs, therefore in main.py it looks like:

  elif args.opt == "adam-mini":
      opt = Adam_mini(
          device=args.device,
          world_size=args.world_size,
          named_parameters=model.named_parameters(),  # check
          lr=args.lr,
          betas=(args.beta1, args.beta2),
          weight_decay=args.weight_decay,
          model_sharding=args.model_sharding,
          dim=args.n_embd,
          n_heads=args.n_head,
          n_kv_heads=args.n_kv_head,
          verbose=args.adam_mini_verbose,
      )

TODO: update partitions names

kylematoba commented 3 weeks ago

hi, I'll add sophia and adafactor.

Andron00e commented 3 weeks ago

hi, I'll add sophia and adafactor.

Hello! Super, just develop this in your branch and then PR to soap. I am a bit overloaded these days, but wanted to try Sophia also

Note: in official repository, they do not show SophiaH (with Hutchinson's preconditioner), only SophiaG. We want to have both methods here. SophiaH is nicely implemented in optax for now, but its not so hard to write in PyToch, see: this link

Thx)

kylematoba commented 3 weeks ago

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

Adafactor is simple, it's already close to being released officially, see https://github.com/pytorch/pytorch/pull/129905.

When I get some time next I'll return to this if you haven't.

martinjaggi commented 3 weeks ago

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too: https://x.com/Yuchenj_UW/status/1846964136204173318

martinjaggi commented 3 weeks ago

once we have a handful, we'll have a nice benchmark collection for LLM optimizers, probably worth a small writeup soon

Andron00e commented 3 weeks ago

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too: https://x.com/Yuchenj_UW/status/1846964136204173318

yes, i am working on that. already have some test runs of the Muon. but, again, it is hard to deduce when batch size is less than 0.5M tokens

btw an interesting exercise – to try this new muon/soap/whatever on the banana function :)

Andron00e commented 3 weeks ago

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

Adafactor is simple, it's already close to being released officially, see pytorch/pytorch#129905.

When I get some time next I'll return to this if you haven't.

I mean, for the official version of SophaiG, you may just look at the paper's repo: https://github.com/Liuhong99/Sophia