Closed iddqd2d closed 2 years ago
Citrinet is a subword based model, not a character based one. You need to use EncDecCTCBPEModel and pass in a Tokenizer config We have not added character based configs for Citrinet as it cannot be trained with character encoding for English.
Thanks, I change my code:
import torch
import pytorch_lightning as pl
import nemo.collections.asr as nemo_asr
from ruamel.yaml import YAML
from omegaconf import DictConfig, OmegaConf
from nemo.utils.exp_manager import exp_manager
model_conf_path = '/home/denis/tim/yaml/citrinet_1024.yaml'
params = OmegaConf.load(model_conf_path)
trainer = pl.Trainer(**params.trainer)
nemo_asr.models.EncDecCTCModelBPE(cfg=params.model, trainer=trainer)
exp_manager(trainer=trainer, cfg=params.exp_manager)
trainer.fit(model)
I create tokenaizer with an4("wav" format) data (command):
python /home/denis/NeMo/scripts/tokenizers/process_asr_text_tokenizer.py \
--manifest="/home/denis/tim/new_prob/an4/train_manifest.json" \
--data_root="/home/denis/tim/new_prob/tokenizer/" \
--vocab_size=1024 \
--tokenizer="spe" \
--spe_type="unigram" \
--spe_character_coverage=1.0 \
--no_lower_case \
--log
I got the filelist :
checkpoints
events.out.tfevents.1658399215.vosk.33707.0
hparams.yaml
nemo_error_log.txt
cmd-args.log
.out.tfevents.1658399471.vosk.33707.1
lightning_logs.txt
nemo_log_globalrank-0_localrank-0.txt
in checkpoints:
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=0.ckpt'
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=99-last.ckpt'
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=1.ckpt'
Citrinet-1024-8x-Stride.nemo
'Citrinet-1024-8x-Stride--val_wer=1.0000-epoch=2.ckpt'
I'm trying to recognize the file (code):
restored_model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(my_path_three) // any *.ckpt file
print(restored_model.transcribe(paths2audio_files=[aud_2]))
or
restored_model = nemo_asr.models.ASRModel.restore_from(restore_path=my_model_three) // *nemo file
print(restored_model.transcribe(paths2audio_files=[aud_2]))
Result:
Transcribing: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.40s/it]
['ftettethreette ⁇ te ⁇ tettetthreetetteertethreete ctetteftteteftettette fiveteftthreetefteftethreeteerteftethreete ⁇ teftethreetetteftettethreetef etfttethreetethreeteerte']
Expected Result:
looks good can i help with something else that's all with that thank you for a call sir have a nice day be safe
My yaml file - https://github.com/iddqd2d/files/blob/main/citrinet_1024.yaml
Your model has not trained at all - look at the checkpoint directory - it's at 100% wer.
What dataset are you trying to train on and how many hours is it ? Possibly you are using too high LR or other things causing model to completely forget original training.
If the dataset is small (less than a hundred hour) you can try adapters with 1-2 epoch training to avoid forgetting full model training.
Training dataset 2 sentences (46 words). It takes 5-10 minutes LR and other things: lr: 0.05 betas: [0.8, 0.25] weight_decay: 0.001 sched: name: CosineAnnealing warmup_steps: 5000 warmup_ratio: null min_lr: 1e-5 last_epoch: -1
My dataset: https://github.com/iddqd2d/files/blob/main/train-dataset.json
training was carried out on 50 epochs
... I'm pretty sure no E2E ASR model will train on that little data. I guess for debugging purposes you can try to overfit to it, but yeah that's just not going to work for anything general.
training was at 50 epochs
What is the minimum amount of data that should be?
These models are trained on roughly 7000 hours of speech ~ 2M audio clips. The current ASRSet 3 dataset has 24,000 hours of speech at around 6 M files. There are ways to fine-tune models with less data - say 100 or so hours, but with 2 sample files I don't think there is any way
Thanks for all
I met the same issue. And follow the solution to change speech_to_text_ctc.py to speech_to_text_ctc_bpe.py and it works. Thanks
Hi, I try this code:
Help, Please...
My YAML config: https://github.com/iddqd2d/files/blob/main/citrinet_1024.yaml