Sreyan88 / GAMA

Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

https://sreyan88.github.io/gamaaudio/

Apache License 2.0

80 stars 8 forks source link

Training error is zero #7

Closed deek2689 closed 2 weeks ago

deek2689 commented 2 months ago

I faced another issue with finetuning. the training loss after each epoch is zero. I tried varying the hyperparameters and using the ones in the script, but it still gives the same thing. Can you suggest what could be the problem? WhatsApp Image 2024-09-06 at 17 49 44

Also, after the training is complete, when checking the log file in weights and biases platform, the log reports "broken pipe error occurs.". WhatsApp Image 2024-09-06 at 17 49 05

I am using IEMOCAP dataset for finetuning.

sonalkum commented 2 months ago

Hi @deek2689, can you please share a bit more details about your training like the batch size, lr? Also are you trying to train it from scratch or are you initializing the weight from one of our checkpoints? If not, can you try with initializing the weights with one of our provided checkpoints?

For the "Broken Pipe" issue for weights and biases, I am not sure if that is related to GAMA. This should be mostly connectivity issue while uploading the info.

Thanks!

deek2689 commented 2 months ago

I tried reducing the batch size to 64 and then varied the LORA parameters (rand and alpha). I also tried using the same hyperparameter settings that you have in your script but still, the training loss error is zero.

training hyperparams

batch_size: int = 128,
micro_batch_size: int = 4,
num_epochs: int = 3,
learning_rate: float = 3e-4,
cutoff_len: int = 256,
val_set_size: int = 2000,
# lora hyperparams
lora_r: int = 8,
lora_alpha: int = 16,
lora_dropout: float = 0.05,

Audio file (one sample example) is in the required format: { "instruction": "Classify the emotion expressed in the provided audio into one of the four labels such as neutral, happy, angry, sad", "input": "", "audio_id": "/content/drive/MyDrive/SER_LLAMA/IEMOCAP_full_release/Session1/sentences/wav/Ses01F_impro01/Ses01F_impro01_F005.wav", "dataset": "IEMOCAP", "task": "emotion_classification", "output": "neutral" },

Sreyan88 commented 2 months ago

The 0 loss generally happens due to unstable training, which causes audio representations to explode to infinity (gradient explosion). Another reason is the fp setting you are using. Can you please help us with the information if you are fine-tuning from an existing checkpoint from scratch?

deek2689 commented 2 months ago

I am currently using the trained Llama-2-7b-chat-hf-qformer model (as provided) for finetuning. so the checkpoints used for finetuning belong to this model.