NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
10.63k stars 2.25k forks source link

CTC Language Finetuning convergence #9436

Open Jeevi10 opened 2 weeks ago

Jeevi10 commented 2 weeks ago

Describe the bug

CTC finetuning is not converging I tried to change hyperparameters, but still there is no luck. During training the model started to output empty strings.

Expected behavior

expected to converge for new language (I was simply following the tutorial given). https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb

Environment overview

Environment details

Package Version


absl-py 2.1.0 accelerated-scan 0.2.0 addict 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 alabaster 0.7.16 alembic 1.13.1 aniso8601 9.0.1 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asciitree 0.3.3 asteroid-filterbanks 0.4.0 asttokens 2.4.1 async-lru 2.0.4 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 audioread 3.0.1 Babel 2.15.0 beautifulsoup4 4.12.3 black 24.4.2 bleach 6.1.0 boto3 1.34.113 botocore 1.34.113 braceexpand 0.1.7 causal-conv1d 1.2.2.post1 cdifflib 1.2.6 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.0.2 clip 0.2.0 cloudpickle 3.0.0 colorama 0.4.6 colorlog 6.8.2 comm 0.2.2 contourpy 1.2.1 cycler 0.12.1 Cython 3.0.10 cytoolz 0.12.3 datasets 2.19.1 debugpy 1.8.1 decorator 5.1.1 decord 0.6.0 defusedxml 0.7.1 diart 0.9.0 diffusers 0.28.0 dill 0.3.8 Distance 0.1.3 docker-pycreds 0.4.0 docopt 0.6.2 docutils 0.21.2 dtw-python 1.4.2 editdistance 0.8.1 einops 0.8.0 einops-exts 0.0.4 exceptiongroup 1.2.0 executing 2.0.1 faiss-cpu 1.8.0 fasteners 0.19 fastjsonschema 2.19.1 fasttext 0.9.2 fiddle 0.3.0 filelock 3.14.0 Flask 2.2.5 Flask-RESTful 0.3.10 fonttools 4.51.0 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2024.3.1 ftfy 6.2.0 future 1.0.0 g2p-en 2.1.0 gdown 5.2.0 gitdb 4.0.11 GitPython 3.1.43 graphviz 0.20.3 greenlet 3.0.3 grpcio 1.64.0 h11 0.14.0 h5py 3.11.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.23.3 hydra-core 1.3.2 HyperPyYAML 1.2.2 idna 3.7 ijson 3.2.3 imageio 2.34.1 imagesize 1.4.1 importlib_metadata 7.1.0 inflect 7.2.1 iniconfig 2.0.0 inquirerpy 0.3.4 intervaltree 3.1.0 ipykernel 6.29.3 ipython 8.24.0 ipython-genutils 0.2.0 ipywidgets 8.1.3 isoduration 20.11.0 isort 5.13.2 itsdangerous 2.2.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.3 jiwer 2.5.2 jmespath 1.0.1 joblib 1.4.2 json5 0.9.25 jsonpointer 2.4 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 julius 0.2.7 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.0 jupyter_server_terminals 0.5.3 jupyterlab 4.2.1 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.2 jupyterlab_widgets 3.0.11 kaldi-python-io 1.2.2 kaldiio 2.18.0 kiwisolver 1.4.5 kornia 0.7.2 kornia_rs 0.1.3 latexcodec 3.0.0 lazy_loader 0.4 Levenshtein 0.22.0 lhotse 1.23.0 libcst 1.4.0 librosa 0.10.2 lightning 2.2.4 lightning-utilities 0.11.2 lilcom 1.7 llvmlite 0.42.0 loguru 0.7.2 lxml 5.2.2 Mako 1.3.3 Markdown 3.6 markdown-it-py 3.0.0 markdown2 2.4.13 MarkupSafe 2.1.5 marshmallow 3.21.2 matplotlib 3.8.4 matplotlib-inline 0.1.7 mdurl 0.1.2 mistune 3.0.2 more-itertools 10.2.0 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nemo_text_processing 1.0.2 nemo_toolkit 2.0.0rc1 nerfacc 0.5.3 nest_asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nltk 3.8.1 notebook 6.4.12 notebook_shim 0.2.4 numba 0.59.1 numcodecs 0.12.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 onnx 1.16.1 open-clip-torch 2.24.0 openai-whisper 20231117 OpenCC 1.1.6 optuna 3.6.1 overrides 7.7.0 packaging 24.0 pandas 2.2.2 pandocfilters 1.5.1 pangu 4.0.6.1 parameterized 0.9.0 parso 0.8.4 pathspec 0.12.1 pexpect 4.9.0 pfzy 0.3.4 pickleshare 0.7.5 pillow 10.3.0 pip 24.0 plac 1.4.3 platformdirs 4.2.1 pluggy 1.5.0 pooch 1.8.1 portalocker 2.8.2 primePy 1.3 progress 1.6 prometheus_client 0.20.0 prompt-toolkit 3.0.42 protobuf 4.25.3 psutil 5.9.8 ptyprocess 0.7.0 pure-eval 0.2.2 pyannote.audio 3.1.1 pyannote.core 5.0.0 pyannote.database 5.1.0 pyannote.metrics 3.2.1 pyannote.pipeline 3.0.1 pyarrow 16.1.0 pyarrow-hotfix 0.6 pybind11 2.12.0 pybtex 0.24.0 pybtex-docutils 1.0.3 pycparser 2.22 pydub 0.25.1 Pygments 2.17.2 pyloudnorm 0.1.1 PyMCubes 0.1.4 pynini 2.1.5 pyparsing 3.1.2 pypinyin 0.51.0 pypinyin-dict 0.8.0 PySocks 1.7.1 pytest 8.2.1 pytest-mock 3.14.0 pytest-runner 6.0.1 python-dateutil 2.9.0 python-json-logger 2.0.7 pytorch-lightning 2.2.4 pytorch-metric-learning 2.5.0 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 rapidfuzz 2.13.7 referencing 0.35.1 regex 2024.4.28 requests 2.31.0 resampy 0.4.3 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.1 rouge_score 0.1.2 rpds-py 0.18.1 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 Rx 3.2.0 s3transfer 0.10.1 sacrebleu 2.4.2 sacremoses 0.1.1 safetensors 0.4.3 scikit-learn 1.4.2 scipy 1.13.0 semver 3.0.2 Send2Trash 1.8.3 sentence-transformers 3.0.0 sentencepiece 0.2.0 sentry-sdk 2.3.1 setproctitle 1.3.3 setuptools 69.5.1 shellingham 1.5.4 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 snowballstemmer 2.2.0 sortedcontainers 2.4.0 sounddevice 0.4.6 soundfile 0.12.1 soupsieve 2.5 sox 1.5.0 soxr 0.3.7 speechbrain 1.0.0 Sphinx 7.3.7 sphinxcontrib-applehelp 1.0.8 sphinxcontrib-bibtex 2.6.2 sphinxcontrib-devhelp 1.0.6 sphinxcontrib-htmlhelp 2.0.5 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.7 sphinxcontrib-serializinghtml 1.1.10 SQLAlchemy 2.0.29 stack-data 0.6.2 sympy 1.12 tabulate 0.9.0 taming-transformers 0.0.1 tensorboard 2.16.2 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 tensorstore 0.1.45 termcolor 2.4.0 terminado 0.18.1 text-unidecode 1.3 textdistance 4.6.2 texterrors 0.4.4 threadpoolctl 3.5.0 tiktoken 0.6.0 timm 1.0.3 tinycss2 1.3.0 tokenizers 0.19.1 tomli 2.0.1 toolz 0.12.1 torch 2.3.0 torch-audiomentations 0.11.1 torch-pitch-shift 1.2.4 torchaudio 2.3.0 torchdiffeq 0.2.3 torchmetrics 1.3.2 torchsde 0.2.6 torchvision 0.18.0 tornado 6.4 tqdm 4.66.2 traitlets 5.14.3 trampoline 0.1.2 transformers 4.40.2 trimesh 4.4.0 triton 2.3.0 typeguard 4.3.0 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.11.0 tzdata 2024.1 uri-template 1.3.0 urllib3 2.2.1 wandb 0.17.0 wcwidth 0.2.13 webcolors 1.13 webdataset 0.2.86 webencodings 0.5.1 websocket-client 1.8.0 websocket-server 0.6.4 Werkzeug 3.0.3 wget 3.2 wheel 0.43.0 whisper-timestamped 1.15.4 widgetsnbextension 4.0.11 wrapt 1.16.0 xxhash 3.4.1 yarl 1.9.4 zarr 2.18.2 zipp 3.17.0

GPU: Tesla V100

nithinraok commented 1 week ago

Its a very old model, I would recommend you to try with Fast Conformer. @titu1994 Few notebooks are based on quartznet architecture, we need to update them to use FastConformer!