EleutherAI / gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
https://www.eleuther.ai
MIT License
8.23k stars 953 forks source link

ValueError when predicting with pretrained models #150

Closed iocuydi closed 3 years ago

iocuydi commented 3 years ago

Describe the bug When using GPT3XL to perform inference with the --predict flag as shown in examples, the following error is thrown

ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']

This is with a single GTX 1070 GPU.

commands that both produced this error were: python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt --gpu_ids=['device:GPU:0']

StellaAthena commented 3 years ago

This code was designed for TPUs and although it should work on GPUs that's not something we officially support. We recommend using the GPT-NeoX repo instead for GPU training.

That said it seems like it's having trouble identifying your GPU. Try the command nvidia-smi and check what the device ID number of your GPU is.

Also, does your machine have 255 GPUs? Otherwise I have no idea where it's getting that number from...

iocuydi commented 3 years ago

No, my machine has only 1 GPU lol. I haven't used mesh tensorflow before but I found this issue: https://github.com/google-research/text-to-text-transfer-transformer/issues/334 in which it seemed to be an issue with the mesh shape? I notice that the mesh shape is Shape[x=128, y=2] when running the above commands, so perhaps it has to do with this?

The device appears to be registered as device 0, other tensorflow models pick it up as 0, and when first loading tensorflow I see the typical "Adding visible gpu devices: 0 ... Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6709 MB memory) -> physical GPU (device: 0 ..." messages

iocuydi commented 3 years ago

I got around this error by setting params['mesh_shape'] = [], not sure if this broke something else because now I'm getting the error: 'tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Rsqrt' used by {{node gpt2/h0/norm_1/rsqrt/parallel_0/Rsqrt}} with these attrs: [T=DT_BFLOAT16]'

although it appeared to build the model properly before displaying this

iocuydi commented 3 years ago

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

StellaAthena commented 3 years ago

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

Great to know! Thanks for chasing this down for us.

I’m going to leave this open as a reminder to make sure this is in the next update.

soneo1127 commented 3 years ago

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system? (I want to predict on 2 GPUs).

GenTxt commented 3 years ago

Hello. I have the same error and would appreciate knowing which files to edit to add the above solutions:

  1. Setting mesh shape to 1x1

model_fns.py

mesh_shape = mtf.convert_to_shape(params["mesh_shape"])

mesh_shape = mtf.convert_to_shape(params["1x1"]) = error

config.json (GPT3_XL_Pile)

"mesh_shape" : "x:128,y:2", (here? "1x1")

Another .py file?

  1. os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

add to 'main.py' and/or 'model_fns.py' (???)

import os os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

As a sidebar managed to convert both models using your 'convert_gpt.py' repo script. Keep in mind to change the huggingface config.json files to match the same "n_head" values or it will generate gibberish.

Able to sample from transformer-based repos

"n_head" : 16, (GPT3_XL)

"n_head" : 20, (GPT3_2.7B)

Cheers

StellaAthena commented 3 years ago

@GenTxt I haven't actually run the model on GPU, so I'll leave that question for @iocuydi or @soneo1127 to answer. I am intrigued by your sidebar though.

Did you test your converted model on long inputs? We are under the impression that that file doesn't work as-is, due to the fact that our model uses local and global attention. Specifically, we think that for short contexts (less than 256 char, if I recall correctly) it works fine but for full contexts it does not. HF is working on (what we think is the problem) on their end, but it'd be a big win if that was extraneous.

StellaAthena commented 3 years ago

@soneo1127 I'm going to recommend you check out the Mesh TF documentation for further info.

GenTxt commented 3 years ago

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

In the meantime would appreciate requested info from others in this thread.

Cheers,

StellaAthena commented 3 years ago

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

I just double checked and it’s actually ~512 where performance should jump off a cliff. For prompts of length 400-512, I would expect that the initial tokens are good but as the model goes on it devolves into gibberish. Is that what you see?

It’s good to see that the model is often better than 1.5B GPT-2: that’s what our preliminary testing has shown too. The next update to the README will include the following table:

Model Pile BPB Pile PPL Lambada Acc. Lambada PPL. Wikitext PPL.
GPT-Neo XL (1.3B) 0.7527 6.159 64.73% 5.04 13.10
GPT-3 XL (1.3B) ------ ----- 63.6% 5.44 -----
GPT-2 (1.5B) 1.0468 ----- 63.24% 8.63 17.48
GPT-Neo Alan (2.7B) 0.7165 5.646 68.83% 4.137 11.39
GPT-3 Ada (2.7B) 0.9631 ----- 67.1% 4.60 -----
GPT-3 DaVinci (175B) 0.7177 ----- 76.2% 3.00 -----
StellaAthena commented 3 years ago

@GenTxt FYI, I have created an issue to serve as the canonical reference for the conversion script issue #174. Please direct any future queries about the conversion script there

jaehyunshinML commented 3 years ago

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system? (I want to predict on 2 GPUs).

Hi

I think the easiest way to use multi-GPU, change Mesh_shape. Set x as 1 and set the y with the number of your GPU in the config file. For example, if you have 4 GPUs.

"mesh_shape" : "x:1,y:4",