Open Edenzzzz opened 1 week ago
This is very weird as TorchScipt explicitly forbids using Sequence
annotator https://pytorch.org/docs/stable/jit_language_reference.html#supported-type
I also ran with NGC docker with torch 2.1.0; didn't work
Can confirm I get the same error using
Can confirm I get the same error using
- PyTorch 2.3.1
- Megatron-LM e33c8f7
- CUDA 12.1
While they didn't say this, using the newest NVIDIA PyTorch container (torch 2.4) seems to work
Thanks for pointing this out. We will have a fix shortly.
When I was processing the training dataset for GPT using the tools/preprocess_data.py
script, I encountered this issue.
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
python tools/preprocess_data.py \
--input data/oscar-1GB.jsonl \
--output-prefix my-gpt3 \
--vocab-file gpt2-vocab.json \
--tokenizer-type GPT2BPETokenizer \
--merge-file gpt2-merges.txt \
--append-eod
Traceback (most recent call last):
File "/mnt/users/lihai/gpt3/code/Megatron-LM/tools/preprocess_data.py", line 23, in
index_f = rank * per_partition_vocab_size
index_l = index_f + per_partition_vocab_size
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py", line 41
# Get the partition's vocab indices
get_vocab_range = VocabUtility.vocab_range_from_per_partition_vocab_size
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
partition_vocab_size = vocab_parallel_logits.size()[-1]
rank = get_tensor_model_parallel_rank()
'calculate_predicted_logits' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 31
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
(
~
target_mask,
masked_target_1d,
~~~~~~~~~~~~~~~~~
predicted_logits,
~~~~~~~~~~~~~~~~~
sum_exp_logits,
~~~~~~~~~~~~~~~
exp_logits,
~~~~~~~~~~~
) = VocabParallelCrossEntropy.calculate_predicted_logits(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vocab_parallel_logits, target, logits_max
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
)
Describe the bug Using torch 2.1.1, running bash examples/bert/train_bert_340m_distributed.sh produces JIT error due to the Sequence annotator in
calculate_logits_max
To Reproduce bash examples/bert/train_bert_340m_distributed.sh
Expected behavior
Stack trace/logs
Environment (please complete the following information):
Proposed fix Disable jit if this occurs Additional context Add any other context about the problem here.