foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
162 stars 27 forks source link

add Aim support #55

Closed lchu-ibm closed 6 months ago

lchu-ibm commented 6 months ago

Similar to add wandb support, we now add support for aim.

The usage of aim is almost identical to wandb, except that it does not support a user-defined run_id (thus we default aim_run_id to None in the config), and resume/continue running can fetch the run_id from existing runs and ingest it in the config.

image
mayank31398 commented 6 months ago

might be useful to look at accelerate for this @lchu-ibm : https://github.com/huggingface/accelerate/blob/main/src/accelerate/tracking.py

It has support for a bunch of trackers with an almost unified API

lchu-ibm commented 6 months ago

@mayank31398 thanks for sharing!

This PR will only add the very basic thing. We have a group of experienced aim users and it will be taken over by them on what trackers to be added and how.

nairbv commented 6 months ago

cc @dushyantbehl fyi

lchu-ibm commented 6 months ago

@nairbv ready for another review.