Open alxmrs opened 6 days ago
Thanks for opening this issue @alxmrs! I think Ray would be a great runtime for Cubed, and should be relatively straightforward to write an executor for (maybe a bit like the Modal one?). Do you know what people generally run Ray on in production/at scale?
Hey Tom! Do you mean what does the userbase look like, or do I know specific people? On the former: Ray is the engine that OpenAI uses to train its GPT models; it's really popular in the ML world. On the latter: Ray, the person (cromwellian), uses Ray, the framework, at Roblox for model training. :)
should be relatively straightforward to write an executor for (maybe a bit like the Modal one?).
I agree, and it does look like it will be similar to Modal.
I meant usage of Anyscale vs KubeRay vs ?? I was wondering if there was a choice that most people use, or whether it's a bit of everything.
Ray, the person (cromwellian), uses Ray, the framework, at Roblox for model training. :)
Got it!
In addition to accelerator support (e.g. via #304), Cubed could benefit ML users by providing ray executor: https://docs.ray.io/en/latest/ray-core/walkthrough.html
Since Cubed is a serverless model, I bet it could get away with only using Tasks/remote functions.
From talking with @cromwellian a bit, my hope is that Cubed could provide memory bounds when trying to saturate GPUs during model training. I'm not totally sure exactly what a training loop with Cubed would look like. Here's how ray integrates with PyTorch, for example: https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer
@shoyer pointed out to me once the idea that GPU OOM errors occur while taking the gradient of a function graph, not necessarily on the forward pass. I'm not totally sure right now if Cubed is in fact a good fit for tackling this problem, only that the potential is exciting.