apoorvkh / torchrunx

Automatically initialize distributed PyTorch environments
https://torchrunx.readthedocs.io
MIT License
1 stars 0 forks source link

Github Actions CI Test #13

Closed apoorvkh closed 1 week ago

apoorvkh commented 1 month ago

For the most part, I think we want to run our tests manually (e.g. with SLURM) on 2 machines with 2 GPUs each.

But we should also include at least one general test in a continuous integration pipeline (Github Actions).

This should be a CPU only test, but should emulate multiple machines. Github Actions (Linux) runners seem to have 4 CPUs in total.

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

I think we can use a Docker setup to emulate 2 machines with 2 CPUs each (on a single runner).

pmcurtin commented 1 month ago

I see! Yes, I think that would make sense, definitely good to have so CI testing.

apoorvkh commented 1 week ago

Closing since we have basic CI tests right now.