Open ncoop57 opened 3 years ago
Ben Wang's distributed training script should have something for this --- see, e.g., https://github.com/kingoflolz/mesh-transformer-jax/blob/master/device_train.py
When I was fiddling with GCS buckets for my own project (running out-of-the-box inference with GPT-J with my own prompts) I had to use the upload/download blobs functions from the GCS docs, but that all seemed pretty straightforward --- the download/upload functions from their docs worked fine. I think the potentially more difficult part is interfacing with Ray, though Ben's script should also help a lot with that.
Hi @ym-han! This is exactly why we want GCS stuff set up, to train GPTNeo-J. We need a bucket that people will be able to access after the event. But for now we can go with a temporary solution, I guess. It would be great if someone who went through GCS process before could take care of this.
Just to be clear, let's distinguish between (i) using a bucket to store things that you are going to use during training and for other purposes within this group and (ii) using a bucket to store things that the public can download. I think (ii) is not going to be cost-effective: we would be better off asking, e.g., the person who owns https://the-eye.eu/ to host it (which is what EAI did with GPT-J's weights). He's on the EleutherAI discord, so you can just DM him about this when the time comes; I'm pretty sure he would be interested in hosting it, based on the interest he's shown in github copilot-like things.
@ym-han thank you for explaining two use cases and I totally agree with you that probably we should just use the first case for train but for long-term storage "the eye" is a good place I will speak with the person in the EAI discord and see if they'd be interested in helping us out.
In order to do distributed training across multiple TPUs and for hosting the model once we lose access to the TPUs, we need to figure out how to setup a GCS bucket to store the model in. Any help on this task would be greatly appreciated!
https://cloud.google.com/storage/