Running OPT 175B with different hardware configurations

sachit-menon commented 2 years ago

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

How can I get the 175B model running for inference on a hardware setup as described below? Is it possible on one node with 8 A6000s with 51GB each, perhaps with DeepSpeed or similar? I know there are multiple other similar issues, but I'm wondering if the requirements can be somewhat relaxed for inference only (and my hardware setup is a bit different), so I thought I'd throw my question into the ring :).

What's your environment?

metaseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0) 1.10.1+cu113
OS (e.g., Linux): Linux
How you installed metaseq (pip, source): per instructions in https://github.com/facebookresearch/metaseq/blob/main/docs/setup.md
Build command you used (if compiling from source):
Python version: 3.9
CUDA/cuDNN version: 11.3
GPU models and configuration: (potentially 2 nodes of) 8x NVIDIA RTX A6000 51GB RAM
Any other relevant information:

xhluca commented 2 years ago

I have not personally tried, but I believe you might be able to do it for inference since the idling VRAM usage is <50GB as far as I remember. I don't think you need to make any changes to constants.py since it's 8 devices by default.

zhisbug commented 2 years ago

@sachit-menon @xhluca

Hi, We recently integrate OPT-175B with the Berkeley-backed system Alpa.

I guess you can try Alpa which exactly allows you to use more heterogeneous GPU setups to train/serve big models like OPT-175B, other than requiring 8x 80GB A100.

See this guide for more details!

patrickvonplaten commented 2 years ago

Just as a side-note, we've converted the 175B checkpoint internally and it runs very well with accelerate on 8 A100s (faster or as fast as deepspeed for single-input queries).

You'll just need to follow the conversion scripts to get the opt-175B checkpoint into HF format :-)

xhluca commented 2 years ago

@patrickvonplaten sorry if this is obvious, but where can I find those scripts? Also great to hear about accelerate, do you know where we can find example code?

I tried running the official tutorial with one of the smaller models and it seems that only a single gpu was used, so I'm not sure if it's because I did something wrong or there some special steps needed for those large models.

patrickvonplaten commented 2 years ago

Hey @xhluca,

Sorry to answer so late - we're working with the meta team to make the HF weights directly available with a short code snippet on how to run them :-)

patrickvonplaten commented 2 years ago

I'll keep you updated here

xhluca commented 2 years ago

Thanks!

stephenroller commented 2 years ago

We've also got model parallel 16 working now, which is useful for running across two nodes (with some perf penalty)

samuelstevens commented 2 years ago

Just as a side-note, we've converted the 175B checkpoint internally and it runs very well with accelerate on 8 A100s (faster or as fast as deepspeed for single-input queries).

8 A100s with 40GB or 80GB memory?

xhluca commented 2 years ago

@samuelstevens Based on personal experiments, it takes ~50GB of VRAM per device, so likely done on A100 80GB.

sachit-menon commented 2 years ago

@patrickvonplaten @stephenroller any idea what kind of timeframe that might happen in? How involved is it to use the conversion script/what are the steps involved in that, if it would be faster to do it that way? Thanks for your efforts on this!

stephenroller commented 2 years ago

In metaseq we now have support for MP16 which lets it work on 16x 32gb nodes. @punitkoura also has a PR for converting the weights for use with HF accelerate which supports much smaller machines via offloading (at a speed cost)

sachit-menon commented 2 years ago

@punitkoura would it be possible to have a brief example/explanation of how to use that with the OPT 175B model? Digging into these commits, I found https://github.com/facebookresearch/metaseq/blob/main/gpu_tests/test_hf_compatibility.py with associated test setup download_and_configure_125m_with_hf_dependencies in https://github.com/facebookresearch/metaseq/blob/main/.circleci/config.yml. However, I'm not sure if 175B has the dependencies or config files that 125M has available, so I don't know if following the exact same steps will work as-is.

Perhaps @patrickvonplaten has used this updated conversion script to put together that code snippet? 😄

xhluca commented 2 years ago

Hey @xhluca,

Sorry to answer so late - we're working with the meta team to make the HF weights directly available with a short code snippet on how to run them :-)

@patrickvonplaten Since it might take some time to make it available, in the mean time is it possible to have a code snippet for the publicly available large models (30B and 60B)? This way, when 175B will be made available, the only thing needed is to change a single line of code.

sachit-menon commented 2 years ago

@patrickvonplaten Bump on @xhluca 's comment above ? Sorry to ping you again, it's been about a month and I have some experiments that I'd really like to run!

patrickvonplaten commented 2 years ago

Hey!

The 30b and 60b models can very easily be run if you have enough GPUs available. You can just follow the code snippets here: https://huggingface.co/facebook/opt-30b#how-to-use and https://huggingface.co/facebook/opt-66b#how-to-use

patrickvonplaten commented 2 years ago

Note that this assumes you have a 80GB A100 available. If instead you have just multiple GPUs available you can replace:

model = AutoModelForCausalLM.from_pretrained("facebook/opt-66b", torch_dtype=torch.float16).cuda()

by:

model = AutoModelForCausalLM.from_pretrained("facebook/opt-66b", torch_dtype=torch.float16, device_map="auto")

which will automatically place layers on the different devices in a smart way. Also see: https://huggingface.co/docs/transformers/v4.21.2/en/main_classes/model#large-model-loading

sachit-menon commented 2 years ago

Is an easy integration with 175B still planned or shelved for now?

patrickvonplaten commented 2 years ago

@stephenroller - could we provide you with the HF 175B checkpoint and you provide it somehow on your website (upon request)?

stephenroller commented 2 years ago

Yes Patrick, that would significantly help unblock this.

patrickvonplaten commented 2 years ago

Awesome - send you an email :-)

facebookresearch / metaseq