NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.58k stars 2.43k forks source link

Error encountered at Step 24: 'set_logging_level' is not defined about tutorials/nlp/Relation_Extraction-BioMegatron.ipynb #7780

Closed Marfars closed 9 months ago

Marfars commented 11 months ago

I followed the instructions provided in the documentation for setting up Colab. The instructions are as follows:

  1. Open a new Python 3 notebook.
  2. Import this notebook from GitHub (File -> Upload Notebook -> "GITHUB" tab -> copy/paste GitHub URL)
  3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)
  4. Run this cell to set up dependencies. However, when I reached Step 24(model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer)), I encountered an error stating that name 'set_logging_level' is not defined. I double-checked the instructions and ensured that I followed them accurately. Below is the stack trace for the error encountered:
    
    [NeMo I 2023-10-23 11:51:10 megatron_init:294] All pipeline model parallel group ranks: [[0]]
    [NeMo I 2023-10-23 11:51:10 megatron_init:295] Rank 0 has pipeline model parallel rank 0
    [NeMo I 2023-10-23 11:51:10 megatron_init:296] All embedding group ranks: [[0]]
    [NeMo I 2023-10-23 11:51:10 megatron_init:297] Rank 0 has embedding rank: 0
    [NeMo E 2023-10-23 11:51:10 common:506] Model instantiation failed!
    Target class:   nemo.collections.nlp.models.language_modeling.megatron_bert_model.MegatronBertModel
    Error(s):   name 'set_logging_level' is not defined
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/common.py", line 485, in from_config_dict
        instance = imported_cls(cfg=config, trainer=trainer)
      File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/language_modeling/megatron_bert_model.py", line 88, in __init__
        super().__init__(cfg, trainer=trainer, no_lm_init=False)
      File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 158, in __init__
        initialize_model_parallel_for_nemo(
      File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/modules/common/megatron/megatron_init.py", line 141, in initialize_model_parallel_for_nemo
        set_logging_level(apex_transformer_log_level)
    NameError: name 'set_logging_level' is not defined

NameError Traceback (most recent call last)

in () ----> 1 model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer) 12 frames /usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/modules/common/megatron/megatron_init.py in initialize_model_parallel_for_nemo(world_size, global_rank, local_rank, tensor_model_parallel_size, pipeline_model_parallel_size, virtual_pipeline_model_parallel_size, pipeline_model_parallel_split_rank, micro_batch_size, global_batch_size, rampup_batch_size, use_fp8, init_mpi_proc_group, seed, apex_transformer_log_level) 139 app_state._is_megatron_initialized = True 140 --> 141 set_logging_level(apex_transformer_log_level) 142 143 ``` Please advise on how to resolve this issue or provide further guidance if there are additional steps needed to avoid this error. Thank you for your assistance.
Marfars commented 11 months ago

Please feel free to email me at marfars9264@gmail.com if there is any information I can provide to assist with troubleshooting this issue.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

wenqiglantz commented 8 months ago

I am running into the same error. Can this issue be reopened and investigated? Thanks!