Open wanderingweights opened 2 weeks ago
Hey, thanks for reporting this. I noticed that our callback doesn't inherit MLflowCallback
, so it's not properly reading that env.
At least, do you notice that the yaml is saved to mlflow?
I think @awhazell added support for mlflow and might be able to help
I can confirm the config yaml is saved as an artifact, but also that the checkpoints/end model are not.
A fix could be as simple as inheriting the callback @NanoCode012 linked (or adding it to the callbacks list separately) but would need to make sure it doesn't conflict with any of the setup from HFCausalTrainerBuilder.build
Hey @awhazell , what potential conflicts were you thinking of?
My only concern may be duplicate logs due to report_to
config and MLflowCallback
logging.
Regarding the change needed, I believe we can just import the callback and append to the callbacks
variable. Would you be interested in working on this PR?
I was thinking about whether setting mlflow options in both the trainer kwargs and env variables could cause issues- but I think you're right and it shouldn't be an issue, they should always be consistent anyway
Opened a PR here https://github.com/axolotl-ai-cloud/axolotl/pull/1976
Please check that this issue hasn't been reported before.
Expected Behavior
I have the below config and I was hoping to have the model checkpoints saving as artifacts but I only get the metrics and config saving.
Is this expected to work or am I missing something?
Thanks for looking over!
Current behaviour
No model checkpoints.
Steps to reproduce
I use the docker image:
"winglian/axolotl:main-latest"
hf login
then
accelerate launch -m axolotl.cli.train theconfig.yml
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
winglian/axolotl:main-latest
Acknowledgements