Open paulaserna16 opened 6 days ago
Hi , sorry could you pull in the megatron core branch (maybe core_r0.9.0 or the latest one) and mount it to /opt/megatorn-lm so that you replace the entire megatron-core rather than manually doing something by adding extensions alone.
I'm trying to run the Dreambooth tutorial, but I'm encountering some issues in the modules.
First, the megatron-lm version being installed when launching a container with nemo framework 24.07, doesn't have the extensions module with the transformer engine.
Then, I tried to make something manually to solve it and added the extensions folder to the megatron/core path. However, 7if I try to execute the dreambooth.py example:
! python /opt/NeMo/examples/multimodal/text_to_image/dreambooth/dreambooth.py \ model.unet_config.from_pretrained=/ckpts/unet.bin \ model.unet_config.from_NeMo=False \ model.first_stage_config.from_pretrained=/ckpts/vae.bin \ model.data.instance_dir=/datasets/instance_dir \ model.data.instance_prompt='a photo of a sks dog'
I get an error of:
ImportError Traceback (most recent call last) Cell In[13], line 4 2 from megatron.core.distributed import DistributedDataParallel as McoreDDP 3 from megatron.core.distributed import DistributedDataParallelConfig ----> 4 from megatron.core.extensions.transformer_engine import ( 5 TEColumnParallelLinear, 6 TEDotProductAttention, 7 TELayerNormColumnParallelLinear, 8 TENorm, 9 TERowParallelLinear, 10 ) 11 from megatron.core.fusions.fused_bias_dropout import get_bias_dropout_add 12 from megatron.core.models.gpt import GPTModel as MCoreGPTModel
File /opt/megatron-lm/megatron/core/extensions/transformer_engine.py:34 32 from megatron.core.transformer.transformer_config import TransformerConfig 33 from megatron.core.transformer.utils import make_sharded_tensors_for_checkpoint ---> 34 from megatron.core.utils import get_te_version, is_te_min_version 37 def _get_extra_te_kwargs(config: TransformerConfig): 38 extra_transformer_engine_kwargs = {"params_dtype": config.params_dtype}
ImportError: cannot import name 'get_te_version' from 'megatron.core.utils' (/opt/megatron-lm/megatron/core/utils.py)
After checking the files, I see that the one in line 4, tranformer engine is trying to import the function from utils: from megatron.core.utils import get_te_version, is_te_min_version And in fact, I checked the megatron.core utils.py file, and this one is calling the tranformer engine within that function:
def get_te_version(): """Get TE version from version; if not available use pip's. Use caching."""
I would appreciate your help.
Thanks!