CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Apache License 2.0
3.29k stars 220 forks source link

How to store model weights in GCS #20

Open ncoop57 opened 3 years ago

ncoop57 commented 3 years ago

In order to do distributed training across multiple TPUs and for hosting the model once we lose access to the TPUs, we need to figure out how to setup a GCS bucket to store the model in. Any help on this task would be greatly appreciated!

https://cloud.google.com/storage/

ym-han commented 3 years ago

Ben Wang's distributed training script should have something for this --- see, e.g., https://github.com/kingoflolz/mesh-transformer-jax/blob/master/device_train.py

When I was fiddling with GCS buckets for my own project (running out-of-the-box inference with GPT-J with my own prompts) I had to use the upload/download blobs functions from the GCS docs, but that all seemed pretty straightforward --- the download/upload functions from their docs worked fine. I think the potentially more difficult part is interfacing with Ray, though Ben's script should also help a lot with that.

arampacha commented 3 years ago

Hi @ym-han! This is exactly why we want GCS stuff set up, to train GPTNeo-J. We need a bucket that people will be able to access after the event. But for now we can go with a temporary solution, I guess. It would be great if someone who went through GCS process before could take care of this.

ym-han commented 3 years ago

Just to be clear, let's distinguish between (i) using a bucket to store things that you are going to use during training and for other purposes within this group and (ii) using a bucket to store things that the public can download. I think (ii) is not going to be cost-effective: we would be better off asking, e.g., the person who owns https://the-eye.eu/ to host it (which is what EAI did with GPT-J's weights). He's on the EleutherAI discord, so you can just DM him about this when the time comes; I'm pretty sure he would be interested in hosting it, based on the interest he's shown in github copilot-like things.

ncoop57 commented 3 years ago

@ym-han thank you for explaining two use cases and I totally agree with you that probably we should just use the first case for train but for long-term storage "the eye" is a good place I will speak with the person in the EAI discord and see if they'd be interested in helping us out.