Open JiamingZhou777 opened 1 month ago
Hi @JiamingZhou777 ,
Thanks for raising this issue:
Could you clarify whether you were attempting to load an OpenAI model directly from the HF Hub or resuming the finetuning process of one of our models? There's a check in modeling_utils.py that might be relevant. It deletes the gradient_checkpointing attribute during finetuning, potentially requiring you to manually set it again. On newer versions of transformers there is also an alternate check that determines if the model was originally created with transformers >4.35.
It's worth double-checking that the whisper class loaded during finetuning is indeed the one from src/models/modeling_whisper.py. While this might not be the primary cause, it could help us narrow down the issue.
I've also tested the code without IterableDataset loading on transformers==4.36, and it appears to be working correctly. If your data size allows, you could try testing with this version, as transformers==4.35 introduced some changes to gradient checkpointing functionality.
Let me know your findings on these points, and we'll continue troubleshooting from there.
Thank you for your response. I have tried the suggested methods, but unfortunately, they didn't work. Could you provide a requirements.txt file? While fine-tuning Hubert, I frequently encounter a "CUDA out of memory" error during the calculation of dev loss. Do you have any suggestions for resolving this? Although I haven't fully run your code, I have cited your paper in mine and submitted it to a conference. I appreciate your assistance!
PS: even with the batch size set to 1 for both the training and development stages.
Hi @JiamingZhou777, thanks for using our code for ASR system development.
For the gradient checkpoint issue, if you look into the error: "AttributeError: 'WhisperForConditionalGeneration' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?"
It is basically a mismatch between the modeling_util.py and model definition to enable the gradient checkpointing feature, potentially caused by transformer version mismatch. You can change either file to make them consistent. Check the errors in the function call history to get an idea where to make such a change.
In terms of the OOM issue, please make sure you have enough GPU memory for running with HuBERT large model. Otherwise, please use HuBERT-base model first.
Thanks, Ruchao
I encountered errors while trying to fine-tune the full parameters of Whisper-small. I have installed transformers==4.32.1. My environment details are listed below. Do you have any suggestions? Thanks!