aws-samples / awsome-distributed-training

Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
MIT No Attribution
205 stars 86 forks source link

Automate onboarding smhp slurm #487

Closed amanshanbhag closed 2 weeks ago

amanshanbhag commented 2 weeks ago

Added login group support + option to auto create cluster

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sean-smith commented 2 weeks ago

Didn't we already merge this?

amanshanbhag commented 2 weeks ago

@sean-smith I made some more changes (adding option for creating cluster, login group etc)

sean-smith commented 2 weeks ago

@amanshanbhag Can you rebase off what's currently merged?