Closed hanchi-gao closed 2 years ago
Yore attempting to train a model with features from the main branch of Nemo, but using a container released in May so it won't work. You'll need to build your own container off of the dockerfile in the main branch
And I really suggest using Sentencepiece as default tokenizer for everything
This suggest is very useful now , I can training successfully.
Describe the bug Hi everyone, I want to train an ASR model (Conformer_CTC_bpe) in Mandarin. I generated vocab.txt using "process_asr_text_tokenizer.py", modified "/conf/conformer/conformer_ctc_bpe.yaml" and executed "examples/asr/asr_ctc/speech_to_text_ctc_bpe.py". but got a TypeError
TypeError("__init__() got an unexpected keyword argument 'causal_downsampling'")
Steps/Code to reproduce bug we only modify conf/conformer/conformer_ctc_bpe.yaml as follows:
where the "dir" is generated by the program "/scripts/tokenizers/process_asr_text_tokenizer.py"。 The directory contains a vocab.txt with the following content:
and I execution the following Code:
CUDA_VISIBLE_DEVICES=2,3 python examples/asr/asr_ctc/speech_to_text_ctc_bpe.py --config-path=../conf/conformer --config-name=conformer_ctc_bpe model.train_ds.manifest_filepath="data_tw_v2/Stage1/Quartznet/train.json" model.validation_ds.manifest_filepath="data_tw_v2/Stage1/Quartznet/valid.json" model.tokenizer.dir=data_tw_v2/Stage2/tokenizer_wpe_v1024 model.tokenizer.type=wpe trainer.devices=2 trainer.accelerator='gpu' trainer.max_epochs=150 trainer.strategy="ddp" model.optim.name="adamw" model.optim.lr=0.001 model.optim.betas=[0.9,0.999] model.optim.weight_decay=0.0001 model.optim.sched.warmup_steps=2000 exp_manager.create_wandb_logger=True exp_manager.wandb_logger_kwargs.name="wpe" exp_manager.wandb_logger_kwargs.project="A"
But got a error of :
Error executing job with overrides: ['model.train_ds.manifest_filepath=data_tw_v2/Stage1/Quartznet/train.json', 'model.validation_ds.manifest_filepath=data_tw_v2/Stage1/Quartznet/valid.json', 'model.tokenizer.dir=data_tw_v2/Stage2/tokenizer_wpe_v1024', 'model.tokenizer.type=wpe', 'trainer.devices=2', 'trainer.accelerator=gpu', 'trainer.max_epochs=150', 'trainer.strategy=ddp', 'model.optim.name=adamw', 'model.optim.lr=0.001', 'model.optim.betas=[0.9,0.999]', 'model.optim.weight_decay=0.0001', 'model.optim.sched.warmup_steps=2000', 'exp_manager.create_wandb_logger=True', 'exp_manager.wandb_logger_kwargs.name=wpe', 'exp_manager.wandb_logger_kwargs.project=Conformer_norm_type'] Error in call to target 'nemo.collections.asr.modules.conformer_encoder.ConformerEncoder': TypeError("__init__() got an unexpected keyword argument 'causal_downsampling'")
Expected behavior Make the ASR model train successfully
Environment overview (please complete the following information) Environment location: Docker Method of NeMo install: docker pull nvcr.io/nvidia/nemo:22.05 docker run --gpus all -it --name Nemo_container -v:/workspace --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 --device=/dev/snd nvcr.io/nvidia/nemo:22.05
Environment details
OS version: Ubuntu 20.04 PyTorch version: 1.12.0a0+8a1a93a Python version: 3.8.13