Open Mistobaan opened 2 years ago
This is something we should be able to set up in the next couple weeks. Are you familiar with setting up such a hosted runner?
I can figure out the details, it really depends on what hardware we have available, if cloud / bare metal or k8s.
I can figure out the details, it really depends on what hardware we have available, if cloud / bare metal or k8s.
k8s, building from a Docker file. There’s info on our Docker file here
@Mistobaan Based on our recent conversations, I'm currently under the impression that the code works and now we just need to allocate a dedicated GPU cluster and set up the CI. Is that correct? If so, I can set up a dedicated GPU cluster and we can start testing the CI.
Overview
In order to test effectively any changes to the codebase using the full cuda / mpi / apex stack of the repository, it would be nice to dedicate some resources of the cluster to hosted runners similar in how deepspeed tests its own code base.