NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.05k stars 2.51k forks source link

Getting Key Error 21 while finetuning Citrinet-1024-8x-Stride #4562

Closed iddqd2d closed 2 years ago

iddqd2d commented 2 years ago

Hi, I try this code:

Help, Please...

import torch
import pytorch_lightning as pl
import nemo.collections.asr as nemo_asr
from ruamel.yaml import YAML
from omegaconf import DictConfig, OmegaConf
from nemo.utils.exp_manager import exp_manager

model_conf_path = '/home/denis/tim/yaml/citrinet_1024.yaml'
params = OmegaConf.load(model_conf_path)
trainer = pl.Trainer(**params.trainer)
model = nemo_asr.models.EncDecCTCModel(cfg=params.model, trainer=trainer)
exp_manager(trainer=trainer, cfg=params.exp_manager)
trainer.fit(model)

My YAML config: https://github.com/iddqd2d/files/blob/main/citrinet_1024.yaml

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
[NeMo I 2022-07-18 14:14:16 audio_to_text_dataset:41] Model level config does not contain `labels`, please explicitly provide `labels` to the dataloaders.
[NeMo W 2022-07-18 14:14:16 audio_to_text_dataset:84] dataset does not have explicitly defined labels
[NeMo I 2022-07-18 14:14:16 collections:193] Dataset loaded with 2 files totalling 0.01 hours
[NeMo I 2022-07-18 14:14:16 collections:194] 0 files were filtered totalling 0.00 hours
[NeMo I 2022-07-18 14:14:16 audio_to_text_dataset:41] Model level config does not contain `labels`, please explicitly provide `labels` to the dataloaders.
[NeMo W 2022-07-18 14:14:16 audio_to_text_dataset:84] dataset does not have explicitly defined labels
[NeMo I 2022-07-18 14:14:16 collections:193] Dataset loaded with 2 files totalling 0.01 hours
[NeMo I 2022-07-18 14:14:16 collections:194] 0 files were filtered totalling 0.00 hours
[NeMo I 2022-07-18 14:14:16 audio_to_text_dataset:41] Model level config does not contain `labels`, please explicitly provide `labels` to the dataloaders.
[NeMo W 2022-07-18 14:14:16 audio_to_text_dataset:84] dataset does not have explicitly defined labels
[NeMo I 2022-07-18 14:14:16 collections:193] Dataset loaded with 2 files totalling 0.01 hours
[NeMo I 2022-07-18 14:14:16 collections:194] 0 files were filtered totalling 0.00 hours
[NeMo I 2022-07-18 14:14:16 features:200] PADDING: 16
[NeMo I 2022-07-18 14:14:18 ctc_models:64] 
    Replacing placeholder number of classes (-1) with actual number of classes - 0
[NeMo I 2022-07-18 14:14:18 conv_asr:427] num_classes of ConvASRDecoder is set to the size of the vocabulary: 0.
[NeMo I 2022-07-18 14:14:18 exp_manager:286] Experiments will be logged at /home/denis/tim/save_tr_nemo/Citrinet-1024-8x-Stride/2022-07-18_14-14-18
[NeMo I 2022-07-18 14:14:18 exp_manager:660] TensorboardLogger has been set up
[NeMo W 2022-07-18 14:14:18 nemo_logging:349] /home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:2267: LightningDeprecationWarning: `Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.
      rank_zero_deprecation("`Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.")

[NeMo W 2022-07-18 14:14:18 exp_manager:899] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2022-07-18 14:14:21 modelPT:587] Optimizer config = Novograd (
    Parameter Group 0
        amsgrad: False
        betas: [0.8, 0.25]
        eps: 1e-08
        grad_averaging: False
        lr: 0.05
        weight_decay: 0.001
    )
[NeMo I 2022-07-18 14:14:21 lr_scheduler:837] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fd6d02abd90>" 
    will be used during training (effective maximum steps = 100) - 
    Parameters : 
    (warmup_steps: 5000
    warmup_ratio: null
    min_lr: 1.0e-05
    last_epoch: -1
    max_steps: 100
    )

  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConvASREncoder                    | 140 M 
2 | decoder           | ConvASRDecoder                    | 1.0 K 
3 | loss              | CTCLoss                           | 0     
4 | spec_augmentation | SpectrogramAugmentation           | 0     
5 | _wer              | WER                               | 0     
------------------------------------------------------------------------
140 M     Trainable params
0         Non-trainable params
140 M     Total params
560.699   Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|                                                                                                                                                  | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/denis/tim/new.py", line 20, in <module>
    trainer.fit(model)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 699, in fit
    self._call_and_handle_interrupt(
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 651, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1165, in _run
    results = self._run_stage()
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _run_stage
    return self._run_train()
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1273, in _run_train
    self._run_sanity_check()
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1341, in _run_sanity_check
    val_loop.run()
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance
    output = self._evaluation_step(**kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 239, in _evaluation_step
    output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1702, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 355, in validation_step
    return self.model(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/denis/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward
    return self.module.validation_step(*inputs, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/nemo/collections/asr/models/ctc_models.py", line 626, in validation_step
    self._wer.update(
  File "/home/denis/anaconda3/lib/python3.9/site-packages/torchmetrics/metric.py", line 383, in wrapped_func
    update(*args, **kwargs)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/nemo/collections/asr/metrics/wer.py", line 753, in update
    reference = self.decoding.decode_tokens_to_str(target)
  File "/home/denis/anaconda3/lib/python3.9/site-packages/nemo/collections/asr/metrics/wer.py", line 652, in decode_tokens_to_str
    hypothesis = ''.join(self.decode_ids_to_tokens(tokens))
  File "/home/denis/anaconda3/lib/python3.9/site-packages/nemo/collections/asr/metrics/wer.py", line 666, in decode_ids_to_tokens
    token_list = [self.labels_map[c] for c in tokens if c != self.blank_id]
  File "/home/denis/anaconda3/lib/python3.9/site-packages/nemo/collections/asr/metrics/wer.py", line 666, in <listcomp>
    token_list = [self.labels_map[c] for c in tokens if c != self.blank_id]
KeyError: 21
titu1994 commented 2 years ago

Citrinet is a subword based model, not a character based one. You need to use EncDecCTCBPEModel and pass in a Tokenizer config We have not added character based configs for Citrinet as it cannot be trained with character encoding for English.

iddqd2d commented 2 years ago

Thanks, I change my code:

import torch
import pytorch_lightning as pl
import nemo.collections.asr as nemo_asr
from ruamel.yaml import YAML
from omegaconf import DictConfig, OmegaConf
from nemo.utils.exp_manager import exp_manager

model_conf_path = '/home/denis/tim/yaml/citrinet_1024.yaml'
params = OmegaConf.load(model_conf_path)
trainer = pl.Trainer(**params.trainer)
nemo_asr.models.EncDecCTCModelBPE(cfg=params.model, trainer=trainer)
exp_manager(trainer=trainer, cfg=params.exp_manager)
trainer.fit(model)

I create tokenaizer with an4("wav" format) data (command):

python /home/denis/NeMo/scripts/tokenizers/process_asr_text_tokenizer.py \
  --manifest="/home/denis/tim/new_prob/an4/train_manifest.json" \
  --data_root="/home/denis/tim/new_prob/tokenizer/" \
  --vocab_size=1024 \
  --tokenizer="spe" \
  --spe_type="unigram" \
  --spe_character_coverage=1.0 \
  --no_lower_case \
  --log

I got the filelist :

checkpoints
events.out.tfevents.1658399215.vosk.33707.0  
hparams.yaml       
nemo_error_log.txt
cmd-args.log  
.out.tfevents.1658399471.vosk.33707.1 
lightning_logs.txt  
nemo_log_globalrank-0_localrank-0.txt

in checkpoints:

'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=0.ckpt'  
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=99-last.ckpt'
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=1.ckpt'   
Citrinet-1024-8x-Stride.nemo
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=2.ckpt'

I'm trying to recognize the file (code):

restored_model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(my_path_three) // any *.ckpt file
print(restored_model.transcribe(paths2audio_files=[aud_2]))

or

restored_model = nemo_asr.models.ASRModel.restore_from(restore_path=my_model_three) // *nemo file
print(restored_model.transcribe(paths2audio_files=[aud_2]))

Result:

Transcribing: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.40s/it]
['ftettethreette ⁇ te ⁇ tettetthreetetteertethreete ctetteftteteftettette fiveteftthreetefteftethreeteerteftethreete ⁇ teftethreetetteftettethreetef etfttethreetethreeteerte']

Expected Result:

looks good can i help with something else that's all with that thank you for a call sir have a nice day be safe

My yaml file - https://github.com/iddqd2d/files/blob/main/citrinet_1024.yaml

titu1994 commented 2 years ago

Your model has not trained at all - look at the checkpoint directory - it's at 100% wer.

What dataset are you trying to train on and how many hours is it ? Possibly you are using too high LR or other things causing model to completely forget original training.

If the dataset is small (less than a hundred hour) you can try adapters with 1-2 epoch training to avoid forgetting full model training.

iddqd2d commented 2 years ago

Training dataset 2 sentences (46 words). It takes 5-10 minutes LR and other things: lr: 0.05 betas: [0.8, 0.25] weight_decay: 0.001 sched: name: CosineAnnealing warmup_steps: 5000 warmup_ratio: null min_lr: 1e-5 last_epoch: -1

My dataset: https://github.com/iddqd2d/files/blob/main/train-dataset.json

training was carried out on 50 epochs

titu1994 commented 2 years ago

... I'm pretty sure no E2E ASR model will train on that little data. I guess for debugging purposes you can try to overfit to it, but yeah that's just not going to work for anything general.

iddqd2d commented 2 years ago

training was at 50 epochs

iddqd2d commented 2 years ago

What is the minimum amount of data that should be?

titu1994 commented 2 years ago

These models are trained on roughly 7000 hours of speech ~ 2M audio clips. The current ASRSet 3 dataset has 24,000 hours of speech at around 6 M files. There are ways to fine-tune models with less data - say 100 or so hours, but with 2 sample files I don't think there is any way

iddqd2d commented 2 years ago

Thanks for all

joyyang1215 commented 1 year ago

I met the same issue. And follow the solution to change speech_to_text_ctc.py to speech_to_text_ctc_bpe.py and it works. Thanks