bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
374 stars 49 forks source link

IndexError: too many indices for tensor of dimension 2 #87

Open YFeather opened 10 months ago

YFeather commented 10 months ago

Meet a new Error:

Traceback (most recent call last):
  File "/Megatron-LM/pretrain_gpt.py", line 148, in <module>
    pretrain(train_valid_test_datasets_provider,
  File "/Megatron-LM/megatron/training.py", line 161, in pretrain
    iteration = train(forward_step_func,
  File "/Megatron-LM/megatron/training.py", line 740, in train
    train_step(forward_step_func,
  File "/Megatron-LM/megatron/training.py", line 434, in train_step
    losses_reduced = forward_backward_func(
  File "/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 360, in forward_backward_no_pipelining
    output_tensor = forward_step(forward_step_func, data_iterator,
  File "/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 218, in forward_step
    output_tensor, loss_func = forward_step_func(data_iterator, model)
  File "/Megatron-LM/pretrain_gpt.py", line 81, in forward_step
    tokens, labels, loss_mask, attention_mask, position_ids = get_batch(
  File "/Megatron-LM/pretrain_gpt.py", line 46, in get_batch
    data_b = tensor_parallel.broadcast_data(keys, data, datatype)
  File "/Megatron-LM/megatron/core/tensor_parallel/data.py", line 76, in broadcast_data
    key_size, key_numel, total_numel = _build_key_size_numel_dictionaries(keys,
  File "/Megatron-LM/megatron/core/tensor_parallel/data.py", line 31, in _build_key_size_numel_dictionaries
    assert data[key].dim() < max_dim, 'you should increase MAX_DATA_DIM'
IndexError: too many indices for tensor of dimension 2

I check the "data", it because "data" is a tensor type, not a dictionary. I don't know when the key added into the data.