NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.28k stars 2.54k forks source link

How can I get stt_en_fastconformer_ctc_small pretrain model?? #11204

Open PhamDangNguyen opened 2 weeks ago

PhamDangNguyen commented 2 weeks ago

I fill "stt_en_fastconformer_ctc_small" in init_from_pretrained_model but i get an error "not found" the "stt_en_fastconformer_ctc_small"

name: "FastConformer-CTC-BPE"
# name: "model STT N test"

init_from_pretrained_model: "stt_en_fastconformer_ctc_small" 

model:
  sample_rate: 16000
  log_prediction: true # enables logging sample predictions in the output during training
  ctc_reduction: 'mean_volume'
  skip_nan_grad: false
  model_defaults:
    pred_hidden: 320
    joint_hidden: 320

  train_ds:
    manifest_filepath: /home/team_voice/STT_pdnguyen/finetune-fast-conformer_14m/metadata_train/fubon_add_all_data_train_10_10_2024.json
    sample_rate: ${model.sample_rate}
    batch_size: 1 # you may increase batch_size if your memory allows
    shuffle: true
    num_workers: 32
    pin_memory: true
    max_duration: 20 # it is set for LibriSpeech, you may need to update it for your dataset
    min_duration: 0.3
    # tarred datasets
    is_tarred: false
    tarred_audio_filepaths: null
    shuffle_n: 2048
    # bucketing params
    bucketing_strategy: "fully_randomized"
    bucketing_batch_size: null
MedAymenF commented 2 weeks ago

As far as I can tell, this model doesn't exist. The only "stt**conformer_ctc_small" models I've found were Conformer models, not FastConformer. Do you have a link to this model?