Argument not a list with same length as devices

HughPH commented 3 years ago

Describe the bug model_fns.py line 112 makes a call to TensorFlow which results in the following error: ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']

Obviously it is correct, the argument has a list much longer than the list of devices.

To Reproduce Steps to reproduce the behavior:

Install Python 3.9 (it is the earliest available Python on Ubuntu 21.04)
Prepare the CUDA repo and install libcudart11, libcublas11, libcublaslt11, libcufft10, libcurand10, libcusolver11, libcusparse11, libcudnn8
Download GPT3_2-7B
Update the path in GPT3_2-7B/config.json
Edit GPTNeo's requirements.txt to change the TensorFlow version to 2.5.0rc0 (this is the earliest available TensorFlow with Python 3.9)
Install the requirements
Create a prompt and save it in testprompt.txt
Run the following command: python main.py --predict --prompt testprompt.txt --gpu_ids device:GPU:0 --model ~/GPT3_2-7B/config.json
See error

Expected behavior Not to get an error 🤷

Proposed solution I haven't the faintest clue. I imagine TensorFlow 2.5.0 has a breaking change in a call where the arguments are flipped, or an argument has been removed or added.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

GPUs: Just a little old GTX 970m
Configs: ? configs of what? what are you expecting here?

Additional context Full log:

2021-04-04 16:56:40.870461: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Current step 400000
Saving config to /mnt/storage/GPT3_2-7B/the-eye.eu/public/AI/gptneo-release/GPT3_2-7B/
2021-04-04 16:56:45.135877: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-04 16:56:45.137213: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-04 16:56:45.149528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:45.149857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 970M computeCapability: 5.2
coreClock: 1.038GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 111.98GiB/s
2021-04-04 16:56:45.149877: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-04 16:56:45.152182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-04-04 16:56:45.152242: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-04-04 16:56:45.152993: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-04 16:56:45.153203: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-04 16:56:45.153838: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-04-04 16:56:45.154402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-04-04 16:56:45.154516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-04-04 16:56:45.154612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:45.155296: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:45.155649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-04-04 16:56:45.155693: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-04 16:56:49.644340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-04 16:56:49.644376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-04-04 16:56:49.644386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-04-04 16:56:49.644603: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:49.645005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:49.645415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-04 16:56:49.645758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3746 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970M, pci bus id: 0000:01:00.0, compute capability: 5.2)
Done!
params = defaultdict(<function fetch_model_params.<locals>.<lambda> at 0x7fbe3b4bfdc0>, {'n_head': 20, 'n_vocab': 50257, 'embed_dropout': 0, 'lr': 0.00016, 'lr_decay': 'cosine', 'warmup_steps': 3000, 'beta1': 0.9, 'beta2': 0.95, 'epsilon': 1e-08, 'ada_epsilon1': '1e-30', 'ada_epsilon2': 0.001, 'opt_name': 'adam', 'weight_decay': 0, 'train_batch_size': 512, 'attn_dropout': 0, 'train_steps': 400000, 'lr_decay_end': 300000, 'eval_steps': 10, 'predict_steps': 0, 'res_dropout': 0, 'eval_batch_size': 128, 'predict_batch_size': 1, 'iterations': 500, 'n_embd': 2560, 'datasets': [['pile', None, None, None]], 'model_path': '/home/hugh/GPT3_2-7B/', 'n_ctx': 2048, 'n_layer': 32, 'scale_by_depth': True, 'scale_by_in': False, 'attention_types': ['global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local'], 'mesh_shape': 'x:64,y:4', 'layout': 'batch:x,embd:y', 'activation_function': 'gelu', 'recompute_grad': True, 'gradient_clipping': 1.0, 'tokens_per_mb_per_replica': 4096, 'padding_id': 50257, 'eos_id': 50256, 'dataset_configs': {'pile': {'n_vocab': 50257, 'path': 'gs://neo-datasets/pile/pile_*.tfrecords', 'eval_path': 'gs://neo-datasets/pile_val.tfrecords', 'tokenizer_is_pretrained': True, 'tokenizer_path': 'gpt2', 'eos_id': 50256, 'padding_id': 50257}}, 'mlm_training': False, 'causal': True, 'num_cores': 256, 'auto_layout': False, 'auto_layout_and_mesh_shape': False, 'use_tpu': False, 'gpu_ids': ['device:GPU:0'], 'steps_per_checkpoint': 5000, 'predict': True, 'model': 'GPT', 'export': False, 'sampling_use_entmax': False, 'moe_layers': None, 'slow_sampling': False})
Using config: {'_model_dir': '/home/hugh/GPT3_2-7B/', '_tf_random_seed': None, '_save_summary_steps': 500, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=500, num_shards=256, num_cores_per_replica=1, per_host_input_for_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1, experimental_allow_per_host_v2_parallel_get_next=False, experimental_feed_hook=None), '_cluster': None}
_TPUContext: eval_on_tpu True
eval_on_tpu ignored because use_tpu is False.
Predictions generated
Calling model_fn.
Running infer on CPU/GPU
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defauling to GELU activation (see here: https://arxiv.org/abs/1606.08415)
prediction_loop marked as finished
Reraising captured error
Traceback (most recent call last):
  File "/mnt/e/GPTNeo/main.py", line 257, in <module>
    main(args)
  File "/mnt/e/GPTNeo/main.py", line 184, in main
    handle_pred_output_fn(predictions, logger, enc, params, out_name=f"predictions_{args.sacred_id}_{current_step}")
  File "/mnt/e/GPTNeo/inputs.py", line 165, in handle_pred_output
    for i, p in enumerate(predictions):
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3153, in predict
    rendezvous.raise_errors()
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
    raise value
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3142, in predict
    for result in super(TPUEstimator, self).predict(
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 612, in predict
    estimator_spec = self._call_model_fn(features, None, ModeKeys.PREDICT,
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2941, in _call_model_fn
    return super(TPUEstimator, self)._call_model_fn(features, labels, mode,
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3199, in _model_fn
    estimator_spec = model_fn_wrapper.call_without_tpu(
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1729, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2072, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/mnt/e/GPTNeo/model_fns.py", line 112, in model_fn
    lowering = mtf.Lowering(graph, {mesh: mesh_impl}, autostack=True)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/ops.py", line 728, in __init__
    op.lower(self)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/ops.py", line 4541, in lower
    slices = mesh_impl.allsplit(slices, mesh_axis, tensor_axis)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/ops.py", line 1099, in allsplit
    which = self.laid_out_pcoord(mesh_axis)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/ops.py", line 1209, in laid_out_pcoord
    return self.slicewise(my_fn, self.laid_out_pnum())
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/placement_mesh_impl.py", line 173, in slicewise
    ret = mtf.parallel(self.devices, fn, *inputs)
  File "/mnt/e/GPTNeo/lib/python3.9/site-packages/mesh_tensorflow/ops.py", line 5659, in parallel
    raise ValueError(
ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']

StellaAthena commented 3 years ago

The Jupyter Notebook is written for TPU and takes a little fiddling to work for GPU. We keep meaning to make a second GPU colab, but haven’t gotten around to it.

One of our users wrote up a guide to using it on GPU here

HughPH commented 3 years ago

I was able to work around this issue by changing line 20 to read

mesh_shape = [("all_processors", 1)]

I imagine - not being wholly familiar with python - that 1 could be replaced with len(params["gpu_ids"]) - though obviously this would only work when there is a gpu_ids parameter.

HughPH commented 3 years ago

Thanks for the information Stella, I had seen that before and turned back to GPTNeo "vanilla" because DeepSpeed requires at least a GTX2080. I have run into some further issues, so maybe I need to set this aside until I've upgraded to a more powerful machine.

StellaAthena commented 3 years ago

I’m glad you were able to work around it! I look forward to hearing about what you get up to with our models :)

HughPH commented 3 years ago

I've been experimenting with OpenAI's GPT-3, but I've immediately hit limitations and items that - for my use case - will have a very high ticket price.

What I want to do is use a GPT with additional training on my own novels to assist me with discerning likely traits from characters, and likely extensions to situations and relationships.

JahJajaka commented 3 years ago

I was able to work around this issue by changing line 20 to read
mesh_shape = [("all_processors", 1)]
I imagine - not being wholly familiar with python - that 1 could be replaced with len(params["gpu_ids"]) - though obviously this would only work when there is a gpu_ids parameter.

Thanks for solution, worked for me. Just to note: it is in file 'models_fns.py'

EleutherAI / gpt-neo

Argument not a list with same length as devices #193