Support GPU jobs in FedScale K8S deployment, and some quality-of-life enhancements.
Support training using GPU for FedScale k8s jobs
Support time-sharing GPU feature so that multiple FedScale k8s jobs can share the same GPU simultaneously (no specific changes in FedScale repo, changes are made solely on k8s infra)
Support checking FedScale k8s job progress interactively
Related issue number
Checks
[x] I've made sure the following tests are passing.
K8S GPU jobs
[x] Dry Run (45 training rounds & 3 evaluation round)
[x] Cifar 10 (45 training rounds & 3 evaluation round)
[x] Femnist (45 training rounds & 3 evaluation round)
Regression 1: K8S CPU jobs
[x] Dry Run (45 training rounds & 3 evaluation round)
[x] Cifar 10 (45 training rounds & 3 evaluation round)
[x] Femnist (45 training rounds & 3 evaluation round)
Regression 2: Original CPU jobs
[x] Dry Run (20 training rounds & 1 evaluation round)
[x] Cifar 10 (20 training rounds & 1 evaluation round)
[x] Femnist (20 training rounds & 1 evaluation round)
Why are these changes needed?
Support GPU jobs in FedScale K8S deployment, and some quality-of-life enhancements.
Related issue number
Checks