NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k stars 2.08k forks source link

[BUG] @jit_fuser fails with Unknown type constructor Sequence #880

Open Edenzzzz opened 1 week ago

Edenzzzz commented 1 week ago

Describe the bug Using torch 2.1.1, running bash examples/bert/train_bert_340m_distributed.sh produces JIT error due to the Sequence annotator in calculate_logits_max

return torch.jit.script(fn, _rcb=rcb)  File "/root/sharedDisk/home/tanwenxuan/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1381, in script

  File "/root/sharedDisk/home/tanwenxuan/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1381, in script
    return torch.jit.script(fn, _rcb=rcb)
    return torch.jit.script(fn, _rcb=rcb)  File "/root/sharedDisk/home/tanwenxuan/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1381, in script

return torch.jit.script(fn, _rcb=rcb)  File "/root/sharedDisk/home/tanwenxuan/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1381, in script

  File "/root/sharedDisk/home/tanwenxuan/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1381, in script
    fn = torch._C._jit_script_compile(
RuntimeError: 
Unknown type constructor Sequence:
  File "/root/sharedDisk/home/tanwenxuan/Megatron-LM/megatron/core/tensor_parallel/utils.py", line 106
    def vocab_range_from_per_partition_vocab_size(
        per_partition_vocab_size: int, rank, world_size: int
    ) -> Sequence[int]:
         ~~~~~~~~~~~~~ <--- HERE
        index_f = rank * per_partition_vocab_size
        index_l = index_f + per_partition_vocab_size
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
  File "/root/sharedDisk/home/tanwenxuan/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py", line 41

        # Get the partition's vocab indices
        get_vocab_range = VocabUtility.vocab_range_from_per_partition_vocab_size
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        partition_vocab_size = vocab_parallel_logits.size()[-1]
        rank = get_tensor_model_parallel_rank()
'calculate_predicted_logits' is being compiled since it was called from 'calculate_predicted_logits'

To Reproduce bash examples/bert/train_bert_340m_distributed.sh

Expected behavior

Stack trace/logs

Environment (please complete the following information):

Proposed fix Disable jit if this occurs Additional context Add any other context about the problem here.

Edenzzzz commented 1 week ago

This is very weird as TorchScipt explicitly forbids using Sequence annotator https://pytorch.org/docs/stable/jit_language_reference.html#supported-type I also ran with NGC docker with torch 2.1.0; didn't work

bentherien commented 1 week ago

Can confirm I get the same error using

Edenzzzz commented 1 week ago

Can confirm I get the same error using

  • PyTorch 2.3.1
  • Megatron-LM e33c8f7
  • CUDA 12.1

While they didn't say this, using the newest NVIDIA PyTorch container (torch 2.4) seems to work

deepakn94 commented 2 days ago

Thanks for pointing this out. We will have a fix shortly.

divisionblur commented 1 day ago

When I was processing the training dataset for GPT using the tools/preprocess_data.py script, I encountered this issue.

'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'

divisionblur commented 1 day ago

python tools/preprocess_data.py \ --input data/oscar-1GB.jsonl \ --output-prefix my-gpt3 \ --vocab-file gpt2-vocab.json \ --tokenizer-type GPT2BPETokenizer \ --merge-file gpt2-merges.txt \ --append-eod Traceback (most recent call last): File "/mnt/users/lihai/gpt3/code/Megatron-LM/tools/preprocess_data.py", line 23, in from megatron.training.tokenizer import build_tokenizer File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/init.py", line 16, in from .initialize import initialize_megatron File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/initialize.py", line 18, in from megatron.training.arguments import parse_args, validate_args File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/arguments.py", line 14, in from megatron.core.models.retro.utils import ( File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/init.py", line 12, in from .decoder_spec import get_retro_decoder_block_spec File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/decoder_spec.py", line 9, in from megatron.core.models.gpt.gpt_layer_specs import ( File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/init.py", line 1, in from .gpt_model import GPTModel File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 13, in from megatron.core.models.common.language_module.language_module import LanguageModule File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/common/language_module/language_module.py", line 9, in from megatron.core.fusions.fused_cross_entropy import fused_vocab_parallel_cross_entropy File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 27, in def calculate_predicted_logits( File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script fn = torch._C._jit_script_compile( File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn return torch.jit.script(fn, _rcb=rcb) File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script fn = torch._C._jit_script_compile( File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn return torch.jit.script(fn, _rcb=rcb) File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script fn = torch._C._jit_script_compile( RuntimeError: Unknown type constructor Sequence: File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/utils.py", line 106 def vocab_range_from_per_partition_vocab_size( per_partition_vocab_size: int, rank, world_size: int ) -> Sequence[int]:

        index_f = rank * per_partition_vocab_size
        index_l = index_f + per_partition_vocab_size
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
  File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py", line 41

        # Get the partition's vocab indices
        get_vocab_range = VocabUtility.vocab_range_from_per_partition_vocab_size
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        partition_vocab_size = vocab_parallel_logits.size()[-1]
        rank = get_tensor_model_parallel_rank()
'calculate_predicted_logits' is being compiled since it was called from 'calculate_predicted_logits'
  File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 31
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

    (
    ~
        target_mask,
    masked_target_1d,
    ~~~~~~~~~~~~~~~~~
    predicted_logits,
    ~~~~~~~~~~~~~~~~~
    sum_exp_logits,
    ~~~~~~~~~~~~~~~
    exp_logits,
    ~~~~~~~~~~~
) = VocabParallelCrossEntropy.calculate_predicted_logits(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    vocab_parallel_logits, target, logits_max
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
)