Audio training step freezes without any error

Hi, I am trying to use your audio model. Preprocessing run without any error, however, training (using main_FineTuneWav2Vec_CV.py) doesn't continue. I don't see any error on the terminal prompt, and it seems the training has been stuck on the first epoch. Do you have any tips for me? Something like below
(MME) root@faec7be028f7:~/MME# python3 src/Audio/FineTuningWav2Vec/main_FineTuneWav2Vec_CV.py  --audios_dir RAVDESS_dir/audios_16kHz --cache_dir data/Audio/cache_dir --out_dir RAVDESS_dir/FineTuningWav2Vec2_out --model_id jonatasgrosman/wav2vec2-large-xlsr-53-english
SAVING DATA IN:  RAVDESS_dir/FineTuningWav2Vec2_out/data/20220904_211030/fold0
2880it [00:00, 163549.41it/s]
Using custom data configuration default-9f01d26c2ae78436
Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-9f01d26c2ae78436/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9776.93it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1683.11it/s]
Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-9f01d26c2ae78436/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1615.37it/s]
Processing fold:  0  - actors in Train fold:  {1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 20, 21, 22, 23, 24}
Processing fold:  0  - actors in Eval fold:  {2, 5, 14, 15, 16}
A classification problem with 8 classes: ['Angry', 'Calm', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']
The target sampling rate: 16000
Generating training...
 #0:   0%|                                                                                                                              | 0/6 [00:00<?, ?ba/s/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.                                                                                              | 0/6 [00:00<?, ?ba/s]
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
 #3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  8.11ba/s]
 #0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  7.70ba/s]
 #1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  7.71ba/s]
 #2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  7.67ba/s]
Generating test...
 #0:   0%|                                                                                                                              | 0/2 [00:00<?, ?ba/s/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.                                                                                              | 0/2 [00:00<?, ?ba/s]
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
/opt/conda/envs/MME/lib/python3.8/site-packages/transformers/feature_extraction_utils.py:161: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  tensor = as_tensor(value)
 #0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.68ba/s]
 #2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.51ba/s]
 #1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.07ba/s]
 #3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.15ba/s]
Training model...
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2ForSpeechClassification: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2ForSpeechClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSpeechClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSpeechClassification were not initialized from the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
Using amp half precision backend
The following columns in the training set  don't have a corresponding argument in `Wav2Vec2ForSpeechClassification.forward` and have been ignored: name, path, actor, emotion.
***** Running training *****
  Num examples = 2280
  Num Epochs = 10
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 2
  Total optimization steps = 1420
  0%|                                                                                                                                | 0/1420 [00:00<?, ?it/s
cristinalunaj / MMEmotionRecognition

Audio training step freezes without any error #7