NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.54k stars 2.42k forks source link

AttributeError: 'MegatronGPTModel' object has no attribute 'decoder' #10034

Open lianghsun opened 1 month ago

lianghsun commented 1 month ago

Description

I am retraining a LLaMA3 model. Due to the limited size of my dataset, I attempted to use freeze_updates as referenced in the NVIDIA NeMo documentation. My configuration is as follows:

freeze_updates:
  enabled: true  # set to false if you want to disable freezing
  modules:   # list all of the modules you want to have freezing logic for
    decoder: 100

However, I encountered the following error:

AttributeError: 'MegatronGPTModel' object has no attribute 'decoder'

I also tried changing decoder to encoder or joint, but I still faced errors. I would like to ask how to properly configure this setting?

Additionally, within the NeMo framework, is it possible to freeze specific layers, such as only the attention layer? If so, how can I achieve this? Thanks!

ericharper commented 3 weeks ago

I'm curious if by retraining you might mean continued training? https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/allmodels/continuetraining.html?highlight=continued%2520training#configure-continual-learning

Regarding your error, the decoder module will be MegatronGPTModel.model.decoder I believe. But you can check by inspecting the MegatronGPTModel