A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Apache License 2.0
11.45k stars 2.39k forks source link

Unable to convert any nemo file using nemo2riva #8569

Closed LL-AI-dev closed 4 months ago

LL-AI-dev commented 6 months ago

Hardware - GPU (T4) Hardware - CPU Operating System - ubuntu 20.04 running on AWS EC2 g4dn.2xlarge instance

I am currently trying to convert a model (several of different types but for now not even a FastPitch model is working) In the past i had deployed several nemo pipelines to riva but that developing environment was lost during some updates and I have not been able to convert and deploy any nemo models since. I believe this lost environment was using nemo 1.20.0 and riva & nemo2riva version 2.13.1, however using those versions does not seem to work for me anymore.

Recently I have been testing several versions of nemo, nemo2riva and riva using the dockerfile below in order to deploy models. (will update with testing data as I continue to try and retry combinations)

FROM nvcr.io/nvidia/nemo:24.01.framework

#get a simple fastpitch model to convert
RUN wget --content-disposition \
    'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/nemo/tts_en_fastpitch/IPA_1.13.0/files?redirect=true&path=tts_en_fastpitch_align_ipa.nemo' \
    -O tts_en_fastpitch_align_ipa.nemo

#without nemo_text_processing it errors with "TypeError: Can't instantiate abstract class ModelPT with abstract methods list_available_models, setup_training_data, setup_validation_data"
#without lhotse it also gets the same TypeError
RUN pip install nemo_text_processing
RUN pip install lhotse

#install nemo2riva
RUN pip install nvidia-pyindex
RUN pip install nemo2riva==2.14.0

#run the test
CMD nemo2riva --out FP_ipa.riva --key tlt_encode tts_en_fastpitch_align_ipa.nemo

As you can see in the dockerfile, it is using a pretrained model from ngc, however I get the same error even on a .nemo model that was trained using the latest nemo version

The error relates to nvidia-eff and being unable to encrypt the model. This error is consistent regardless of the nemo image used. I have also tried using a pytorch base image but this results in the same errors. I tried using a riva-servicemaker image too but that had an issue arising because the image uses python 3.8 but nemo has required 3.10 for a long time.

I really need some help resolving this as development is being delayed as I cannot update any ASR, NLP, or TTS models currently. How can I resolve this?

INFO: generated new fontManager
INFO: NumExpr defaulting to 8 threads.
[NeMo I 2024-03-04 01:59:20 nemo2riva:38] Logging level set to 20
[NeMo I 2024-03-04 01:59:20 convert:36] Restoring NeMo model from 'tts_en_fastpitch_align_ipa.nemo'
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
WARNING: Logging before flag parsing goes to stderr.
I0304 01:59:23.133829 139807560075072 tokenize_and_classify.py:86] Creating ClassifyFst grammars.
[NeMo W 2024-03-04 01:59:51 deprecated:65] Function ``g2p_backward_compatible_support`` is deprecated. But it will not be removed until a further notice. G2P object root directory `nemo_text_processing.g2p` has been replaced with `nemo.collections.tts.g2p`. Please use the latter instead as of NeMo 1.18.0.
[NeMo W 2024-03-04 01:59:52 experimental:26] `<class 'nemo.collections.tts.g2p.models.i18n_ipa.IpaG2p'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-03-04 01:59:53 i18n_ipa:124] apply_to_oov_word=None, This means that some of words will remain unchanged if they are not handled by any of the rules in self.parse_one_word(). This may be intended if phonemes and chars are both valid inputs, otherwise, you may see unexpected deletions in your input.
[NeMo W 2024-03-04 01:59:53 experimental:26] `<class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'>` is experimental and not ready for production yet. Use at your own risk.
[NeMo W 2024-03-04 01:59:53 modelPT:165] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
      _target_: nemo.collections.tts.torch.data.TTSDataset
      manifest_filepath: /data3/LJSpeech/nvidia_ljspeech_train.json
      sample_rate: 22050
      sup_data_path: /data3/LJSpeech/tmp_ignoreamb/
      - align_prior_matrix
      - pitch
      n_fft: 1024
      win_length: 1024
      hop_length: 256
      window: hann
      n_mels: 80
      lowfreq: 0
      highfreq: 8000
      max_duration: null
      min_duration: 0.1
      ignore_file: null
      trim: false
      pitch_fmin: 65.40639132514966
      pitch_fmax: 2093.004522404789
      pitch_norm: true
      pitch_mean: 212.35873413085938
      pitch_std: 68.52806091308594
      use_beta_binomial_interpolator: true
      drop_last: false
      shuffle: true
      batch_size: 32
      num_workers: 12

[NeMo W 2024-03-04 01:59:53 modelPT:172] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
      _target_: nemo.collections.tts.torch.data.TTSDataset
      manifest_filepath: /data3/LJSpeech/nvidia_ljspeech_val.json
      sample_rate: 22050
      sup_data_path: /data3/LJSpeech/tmp_ignoreamb/
      - align_prior_matrix
      - pitch
      n_fft: 1024
      win_length: 1024
      hop_length: 256
      window: hann
      n_mels: 80
      lowfreq: 0
      highfreq: 8000
      max_duration: null
      min_duration: null
      ignore_file: null
      trim: false
      pitch_fmin: 65.40639132514966
      pitch_fmax: 2093.004522404789
      pitch_norm: true
      pitch_mean: 212.35873413085938
      pitch_std: 68.52806091308594
      use_beta_binomial_interpolator: true
      drop_last: false
      shuffle: false
      batch_size: 32
      num_workers: 8

[NeMo I 2024-03-04 01:59:53 features:289] PADDING: 1
[NeMo I 2024-03-04 01:59:54 save_restore_connector:249] Model FastPitchModel was successfully restored from /workspace/tts_en_fastpitch_align_ipa.nemo.
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-isc-exported-bert.yaml for nemo.collections.nlp.models.IntentSlotClassificationModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/tts-exported-fastpitchmodel.yaml for nemo.collections.tts.models.FastPitchModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/tts-exported-hifiganmodel.yaml for nemo.collections.tts.models.HifiGanModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/tts-exported-radttsmodel.yaml for nemo.collections.tts.models.RadTTSModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/asr-stt-exported-encdecctcmodel.yaml for nemo.collections.asr.models.EncDecCTCModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-tc-exported-bert.yaml for nemo.collections.nlp.models.TextClassificationModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-mt-exported-encdecmtmodel.yaml for nemo.collections.nlp.models.MTEncDecModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-tkc-exported-bert.yaml for nemo.collections.nlp.models.TokenClassificationModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml for nemo.collections.asr.models.EncDecCTCModelBPE
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/asr-scr-exported-encdecclsmodel.yaml for nemo.collections.asr.models.classification_models.EncDecClassificationModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-qa-exported-bert.yaml for nemo.collections.nlp.models.QAModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-mt-exported-megatronnmtmodel.yaml for nemo.collections.nlp.models.MegatronNMTModel
[NeMo I 2024-03-04 01:59:54 schema:161] Loaded schema file /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/nlp-pc-exported-bert.yaml for nemo.collections.nlp.models.PunctuationCapitalizationModel
[NeMo I 2024-03-04 01:59:54 schema:200] Found validation schema for nemo.collections.tts.models.FastPitchModel at /usr/local/lib/python3.10/dist-packages/nemo2riva/validation_schemas/tts-exported-fastpitchmodel.yaml
[NeMo I 2024-03-04 01:59:54 schema:229] Checking installed NeMo version ... 1.23.0 OK (>=1.1)
[NeMo I 2024-03-04 01:59:54 artifacts:59] Found model at ./model_weights.ckpt
[NeMo I 2024-03-04 01:59:54 artifacts:136] Retrieved artifacts: dict_keys(['36d6b09d4dbc45dcb02222e1931e4c7c_lj_speech.tsv', '446fe5373191447190c14cdb8e967e58_ipa_cmudict-0.7b_nv22.08.txt', 'e2327d2f57dd41b88601774804001221_heteronyms-052722', 'model_config.yaml', 'mapping.txt'])
[NeMo I 2024-03-04 01:59:54 cookbook:78] Exporting model FastPitchModel with config=ExportConfig(export_subnet=None, export_format='ONNX', export_file='model_graph.onnx', encryption=True, autocast=True, max_dim=None, export_args={})
[NeMo I 2024-03-04 02:00:01 exportable:131] Successfully exported FastPitchModel to /tmp/tmp7gtd0omy/model_graph.onnx
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] Attribute of type TYPE_PROTO is currently unsupported. Skipping attribute.
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] Could not convert: UNDEFINED to a corresponding NumPy type. The original ONNX type will be preserved. 
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] Attribute of type TYPE_PROTO is currently unsupported. Skipping attribute.
2024-03-04 02:00:01.862915635 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_980
2024-03-04 02:00:01.862971956 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_975
2024-03-04 02:00:01.862988821 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_960
2024-03-04 02:00:01.863000639 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_955
2024-03-04 02:00:01.863011349 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_951
2024-03-04 02:00:01.863026569 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_945
2024-03-04 02:00:01.863059506 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_911
2024-03-04 02:00:01.863083302 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_906
2024-03-04 02:00:01.863103596 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_891
2024-03-04 02:00:01.863122579 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_886
2024-03-04 02:00:01.863142365 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_882
2024-03-04 02:00:01.863161463 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_876
2024-03-04 02:00:01.863180990 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_842
2024-03-04 02:00:01.863202010 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_837
2024-03-04 02:00:01.863224208 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_822
2024-03-04 02:00:01.863235010 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_817
2024-03-04 02:00:01.863244066 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_813
2024-03-04 02:00:01.863256423 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_807
2024-03-04 02:00:01.877913305 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1055
2024-03-04 02:00:01.877951197 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1050
2024-03-04 02:00:01.877967245 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1035
2024-03-04 02:00:01.877974987 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1030
2024-03-04 02:00:01.877984501 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1026
2024-03-04 02:00:01.877994518 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_1020
Traceback (most recent call last):
  File "/usr/local/bin/nemo2riva", line 8, in <module>
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cli/nemo2riva.py", line 49, in nemo2riva
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/convert.py", line 87, in Nemo2Riva
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cookbook.py", line 141, in export_model
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/artifacts.py", line 92, in create_artifact
  File "<frozen eff.core.file>", line 128, in encrypt
PermissionError: Cannot encrypt the artifact without encryption
LL-AI-dev commented 6 months ago

After some fiddling, I was able to get the nemo2riva part of the deployment working using a riva docker image. It can create an environment that allows the conversion of .nemo to .riva files for both pretrained and finetuned (regular and adapter variants) of FastPitch and HifiGan.

The step to create a .rmir file is also successful in that the riva-build command completes without error. However when this model is being deployed by bash riva_start.sh, we get the new error: failed to load 'riva-onnx-fastpitch_encoder-Jaz_v1' version 1: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-Jaz_v1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8 (The full docker log is listed at the bottom)


FROM nvcr.io/nvidia/riva/riva-speech:2.14.0-servicemaker

#make sure pip ist installed
RUN apt update && apt install python3-pip -y

#get a simple fastpitch model to convert
RUN wget --content-disposition \
'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/nemo/tts_en_fastpitch/IPA_1.13.0/files?redirect=true&path=tts_en_fastpitch_align_ipa.nemo' \
-O tts_en_fastpitch_align_ipa.nemo

#install NeMo dependencies
RUN apt-get update && apt-get install -y libsndfile1 ffmpeg
RUN pip install Cython

#install nemo2riva dependencies
RUN pip install nvidia-pyindex

#install NeMo
RUN git clone https://github.com/NVIDIA/NeMo
RUN git switch 'r1.23.0'
RUN pip install -e .

#install some other required packages
RUN pip install matplotlib
RUN pip install einops
RUN pip install transformers
RUN pip install pandas
RUN pip install inflect
RUN pip install typing_extensions==4.7.1
RUN pip install wandb
RUN pip install youtokentome
RUN pip install editdistance
RUN pip install nemo_text_processing
RUN pip install lhotse
RUN pip install pyannote.audio
RUN pip install webdataset
RUN pip install datasets
RUN pip install jiwer

#install nemo2riva
RUN pip install nemo2riva==2.14.0

#fix the errors arising due to nemo requiring python 3.10
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/common/tokenizers/canary_tokenizer.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/data/audio_to_text_lhotse.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/data/audio_to_text_lhotse_prompted.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/common/data/lhotse/nemo_adapters.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/common/data/lhotse/dataloader.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/models/aed_multitask_models.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/data/huggingface/hf_audio_to_text.py
RUN sed -i '1i from __future__ import annotations' /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py
below is the full docker log. click to expand: ``` ========================== === Riva Speech Skills === ========================== NVIDIA Release 23.12 (build 77214108) Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license > Riva waiting for Triton server to load all models...retrying in 1 second I0304 06:27:43.963750 102 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0fc4000000' with size 268435456 I0304 06:27:43.966148 102 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000 I0304 06:27:43.971749 102 model_lifecycle.cc:459] loading: riva-onnx-fastpitch_encoder-Jaz_v1:1 I0304 06:27:43.971796 102 model_lifecycle.cc:459] loading: riva-trt-hifigan-Jaz_v1:1 I0304 06:27:43.971839 102 model_lifecycle.cc:459] loading: spectrogram_chunker-Jaz_v1:1 I0304 06:27:43.971880 102 model_lifecycle.cc:459] loading: tts_postprocessor-Jaz_v1:1 I0304 06:27:43.971934 102 model_lifecycle.cc:459] loading: tts_preprocessor-Jaz_v1:1 I0304 06:27:43.973206 102 onnxruntime.cc:2459] TRITONBACKEND_Initialize: onnxruntime I0304 06:27:43.973231 102 onnxruntime.cc:2469] Triton TRITONBACKEND API version: 1.10 I0304 06:27:43.973236 102 onnxruntime.cc:2475] 'onnxruntime' TRITONBACKEND API version: 1.10 I0304 06:27:43.973241 102 onnxruntime.cc:2505] backend configuration: {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I0304 06:27:44.037319 102 tensorrt.cc:5444] TRITONBACKEND_Initialize: tensorrt I0304 06:27:44.037343 102 tensorrt.cc:5454] Triton TRITONBACKEND API version: 1.10 I0304 06:27:44.037351 102 tensorrt.cc:5460] 'tensorrt' TRITONBACKEND API version: 1.10 I0304 06:27:44.037356 102 tensorrt.cc:5488] backend configuration: {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I0304 06:27:44.037540 102 onnxruntime.cc:2563] TRITONBACKEND_ModelInitialize: riva-onnx-fastpitch_encoder-Jaz_v1 (version 1) I0304 06:27:44.466490 102 onnxruntime.cc:2606] TRITONBACKEND_ModelInstanceInitialize: riva-onnx-fastpitch_encoder-Jaz_v1_0 (GPU device 0) I0304 06:27:44.583420 102 onnxruntime.cc:2640] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0304 06:27:44.583435 102 tensorrt.cc:5578] TRITONBACKEND_ModelInitialize: riva-trt-hifigan-Jaz_v1 (version 1) I0304 06:27:44.583459 102 onnxruntime.cc:2586] TRITONBACKEND_ModelFinalize: delete model state E0304 06:27:44.583484 102 model_lifecycle.cc:596] failed to load 'riva-onnx-fastpitch_encoder-Jaz_v1' version 1: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-Jaz_v1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8 I0304 06:27:44.583984 102 backend_model.cc:188] Overriding execution policy to "TRITONBACKEND_EXECUTION_BLOCKING" for sequence model "riva-trt-hifigan-Jaz_v1" I0304 06:27:44.584806 102 spectrogram-chunker.cc:270] TRITONBACKEND_ModelInitialize: spectrogram_chunker-Jaz_v1 (version 1) I0304 06:27:44.585550 102 backend_model.cc:303] model configuration: { "name": "spectrogram_chunker-Jaz_v1", "platform": "", "backend": "riva_tts_chunker", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "SPECTROGRAM", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 80, -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "IS_LAST_SENTENCE", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "NUM_VALID_FRAMES_IN", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "SENTENCE_NUM", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "DURATIONS", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "PROCESSED_TEXT", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "VOLUME", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "SPECTROGRAM_CHUNK", "data_type": "TYPE_FP32", "dims": [ 80, -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "END_FLAG", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "NUM_VALID_SAMPLES_OUT", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "SENTENCE_NUM", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "DURATIONS", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "PROCESSED_TEXT", "data_type": "TYPE_STRING", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "VOLUME", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 8, "preferred_batch_size": [ 8 ], "max_queue_delay_microseconds": 1000 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "spectrogram_chunker-Jaz_v1_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "num_mels": { "string_value": "80" }, "num_samples_per_frame": { "string_value": "512" }, "supports_volume": { "string_value": "True" }, "chunk_length": { "string_value": "80" }, "max_execution_batch_size": { "string_value": "8" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": true } } I0304 06:27:44.585599 102 tensorrt.cc:5627] TRITONBACKEND_ModelInstanceInitialize: riva-trt-hifigan-Jaz_v1_0 (GPU device 0) I0304 06:27:44.640208 102 logging.cc:49] Loaded engine size: 28 MiB > Riva waiting for Triton server to load all models...retrying in 1 second I0304 06:27:44.787852 102 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +31, now: CPU 0, GPU 31 (MiB) I0304 06:27:44.796150 102 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +186, now: CPU 0, GPU 217 (MiB) I0304 06:27:44.796536 102 tensorrt.cc:1547] Created instance riva-trt-hifigan-Jaz_v1_0 on GPU 0 with stream priority 0 and optimization profile default[0]; I0304 06:27:44.796915 102 model_lifecycle.cc:693] successfully loaded 'riva-trt-hifigan-Jaz_v1' version 1 I0304 06:27:44.802776 102 spectrogram-chunker.cc:272] TRITONBACKEND_ModelInstanceInitialize: spectrogram_chunker-Jaz_v1_0 (device 0) I0304 06:27:44.802834 102 tts-postprocessor.cc:305] TRITONBACKEND_ModelInitialize: tts_postprocessor-Jaz_v1 (version 1) I0304 06:27:44.803194 102 model_lifecycle.cc:693] successfully loaded 'spectrogram_chunker-Jaz_v1' version 1 I0304 06:27:44.803487 102 backend_model.cc:303] model configuration: { "name": "tts_postprocessor-Jaz_v1", "platform": "", "backend": "riva_tts_postprocessor", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "INPUT", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1, -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "NUM_VALID_SAMPLES", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "Prosody_volume", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "OUTPUT", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 8, "preferred_batch_size": [ 8 ], "max_queue_delay_microseconds": 100 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "tts_postprocessor-Jaz_v1_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "use_denoiser": { "string_value": "False" }, "supports_volume": { "string_value": "True" }, "filter_length": { "string_value": "1024" }, "fade_length": { "string_value": "256" }, "num_samples_per_frame": { "string_value": "512" }, "chunk_num_samples": { "string_value": "40960" }, "max_execution_batch_size": { "string_value": "8" }, "max_chunk_size": { "string_value": "131072" }, "hop_length": { "string_value": "256" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": false } } I0304 06:27:44.803568 102 tts-postprocessor.cc:307] TRITONBACKEND_ModelInstanceInitialize: tts_postprocessor-Jaz_v1_0 (device 0) I0304 06:27:44.824235 102 tts-preprocessor.cc:337] TRITONBACKEND_ModelInitialize: tts_preprocessor-Jaz_v1 (version 1) I0304 06:27:44.824489 102 model_lifecycle.cc:693] successfully loaded 'tts_postprocessor-Jaz_v1' version 1 W0304 06:27:44.824928 102 tts-preprocessor.cc:284] Parameter abbreviation_path is deprecated WARNING: Logging before InitGoogleLogging() is written to STDERR I0304 06:27:44.824993 112 preprocessor.cc:231] TTS character mapping loaded from /data/models/tts_preprocessor-Jaz_v1/1/mapping.txt I0304 06:27:44.921231 112 preprocessor.cc:269] TTS phonetic mapping loaded from /data/models/tts_preprocessor-Jaz_v1/1/ipa_cmudict-0.7b_nv22.08.txt I0304 06:27:44.921326 112 preprocessor.cc:282] Abbreviation mapping loaded from /data/models/tts_preprocessor-Jaz_v1/1/abbr.txt I0304 06:27:44.921344 112 normalize.cc:52] Speech Class far file missing:/data/models/tts_preprocessor-Jaz_v1/1/speech_class.far I0304 06:27:45.010165 112 preprocessor.cc:292] TTS normalizer loaded from /data/models/tts_preprocessor-Jaz_v1/1/ I0304 06:27:45.010266 102 backend_model.cc:303] model configuration: { "name": "tts_preprocessor-Jaz_v1", "platform": "", "backend": "riva_tts_preprocessor", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 8, "input": [ { "name": "input_string", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ 1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "output", "data_type": "TYPE_INT64", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "output_mask", "data_type": "TYPE_FP32", "dims": [ 1, 400, 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "output_length", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "is_last_sentence", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "output_string", "data_type": "TYPE_STRING", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "sentence_num", "data_type": "TYPE_INT32", "dims": [ 1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "pitch", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "duration", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "volume", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "graph": { "level": 0 }, "priority": "PRIORITY_DEFAULT", "cuda": { "graphs": false, "busy_wait_events": false, "graph_spec": [], "output_copy_stream": true }, "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "sequence_batching": { "oldest": { "max_candidate_sequences": 8, "preferred_batch_size": [ 8 ], "max_queue_delay_microseconds": 100 }, "max_sequence_idle_microseconds": 60000000, "control_input": [ { "name": "START", "control": [ { "kind": "CONTROL_SEQUENCE_START", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "READY", "control": [ { "kind": "CONTROL_SEQUENCE_READY", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "END", "control": [ { "kind": "CONTROL_SEQUENCE_END", "int32_false_true": [ 0, 1 ], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_INVALID" } ] }, { "name": "CORRID", "control": [ { "kind": "CONTROL_SEQUENCE_CORRID", "int32_false_true": [], "fp32_false_true": [], "bool_false_true": [], "data_type": "TYPE_UINT64" } ] } ], "state": [] }, "instance_group": [ { "name": "tts_preprocessor-Jaz_v1_0", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "max_sequence_length": { "string_value": "400" }, "supports_speaker_mixing": { "string_value": "False" }, "upper_case_chars": { "string_value": "True" }, "g2p_ignore_ambiguous": { "string_value": "False" }, "phone_set": { "string_value": "ipa" }, "dictionary_path": { "string_value": "/data/models/tts_preprocessor-Jaz_v1/1/ipa_cmudict-0.7b_nv22.08.txt" }, "abbreviations_path": { "string_value": "/data/models/tts_preprocessor-Jaz_v1/1/abbr.txt" }, "supports_ragged_batches": { "string_value": "True" }, "norm_proto_path": { "string_value": "/data/models/tts_preprocessor-Jaz_v1/1/" }, "mapping_path": { "string_value": "/data/models/tts_preprocessor-Jaz_v1/1/mapping.txt" }, "normalize_pitch": { "string_value": "True" }, "upper_case_g2p": { "string_value": "True" }, "pitch_std": { "string_value": "50.46181106567383" }, "max_input_length": { "string_value": "2000" }, "language": { "string_value": "en-US" }, "pad_with_space": { "string_value": "True" }, "subvoices": { "string_value": "0:0" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": true } } I0304 06:27:45.010360 102 tts-preprocessor.cc:339] TRITONBACKEND_ModelInstanceInitialize: tts_preprocessor-Jaz_v1_0 (device 0) I0304 06:27:45.010680 102 model_lifecycle.cc:693] successfully loaded 'tts_preprocessor-Jaz_v1' version 1 E0304 06:27:45.010726 102 model_repository_manager.cc:481] Invalid argument: ensemble 'fastpitch_hifigan_ensemble-Jaz_v1' depends on 'riva-onnx-fastpitch_encoder-Jaz_v1' which has no loaded version I0304 06:27:45.010785 102 server.cc:563] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0304 06:27:45.010857 102 server.cc:590] +------------------------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +------------------------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | | riva_tts_preprocessor | /opt/tritonserver/backends/riva_tts_preprocessor/libtriton_riva_tts_preprocessor.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | | tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | | riva_tts_chunker | /opt/tritonserver/backends/riva_tts_chunker/libtriton_riva_tts_chunker.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | | riva_tts_postprocessor | /opt/tritonserver/backends/riva_tts_postprocessor/libtriton_riva_tts_postprocessor.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | +------------------------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0304 06:27:45.010956 102 server.cc:633] +------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Model | Version | Status | +------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | riva-onnx-fastpitch_encoder-Jaz_v1 | 1 | UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-Jaz_v1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8 | | riva-trt-hifigan-Jaz_v1 | 1 | READY | | spectrogram_chunker-Jaz_v1 | 1 | READY | | tts_postprocessor-Jaz_v1 | 1 | READY | | tts_preprocessor-Jaz_v1 | 1 | READY | +------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0304 06:27:45.084929 102 metrics.cc:864] Collecting metrics for GPU 0: Tesla T4 I0304 06:27:45.085172 102 metrics.cc:757] Collecting CPU metrics I0304 06:27:45.085354 102 tritonserver.cc:2264] +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.27.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging | | model_repository_path[0] | /data/models | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 1000000000 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0304 06:27:45.085368 102 server.cc:264] Waiting for in-flight requests to complete. I0304 06:27:45.085383 102 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences I0304 06:27:45.085634 102 server.cc:295] All models are stopped, unloading models I0304 06:27:45.085638 102 tts-postprocessor.cc:310] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0304 06:27:45.085652 102 server.cc:302] Timeout 30: Found 4 live models and 0 in-flight non-inference requests I0304 06:27:45.085697 102 tensorrt.cc:5665] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0304 06:27:45.085746 102 spectrogram-chunker.cc:275] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0304 06:27:45.085788 102 tts-preprocessor.cc:342] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0304 06:27:45.085799 102 spectrogram-chunker.cc:271] TRITONBACKEND_ModelFinalize: delete model state I0304 06:27:45.085826 102 tts-preprocessor.cc:338] TRITONBACKEND_ModelFinalize: delete model state I0304 06:27:45.085946 102 model_lifecycle.cc:578] successfully unloaded 'spectrogram_chunker-Jaz_v1' version 1 I0304 06:27:45.092669 102 tts-postprocessor.cc:306] TRITONBACKEND_ModelFinalize: delete model state I0304 06:27:45.092836 102 model_lifecycle.cc:578] successfully unloaded 'tts_postprocessor-Jaz_v1' version 1 I0304 06:27:45.102135 102 tensorrt.cc:5604] TRITONBACKEND_ModelFinalize: delete model state I0304 06:27:45.102301 102 model_lifecycle.cc:578] successfully unloaded 'riva-trt-hifigan-Jaz_v1' version 1 I0304 06:27:45.106678 102 model_lifecycle.cc:578] successfully unloaded 'tts_preprocessor-Jaz_v1' version 1 > Riva waiting for Triton server to load all models...retrying in 1 second I0304 06:27:46.085741 102 server.cc:302] Timeout 29: Found 0 live models and 0 in-flight non-inference requests error: creating server: Internal - failed to load all models > Riva waiting for Triton server to load all models...retrying in 1 second > Riva waiting for Triton server to load all models...retrying in 1 second > Triton server died before reaching ready state. Terminating Riva startup. Check Triton logs with: docker logs ```
itzsimpl commented 6 months ago

@LL-AI-dev the failure on deploy is very similar to what I saw deploying ASR models. It turned out to be caused by a discrepancy in the version of onnx that NeMo uses and the one that is used by Riva/Triton (older); see this issue https://github.com/nvidia-riva/nemo2riva/issues/36. For ASR the workaround was to either downgrade the onnx library in NeMo or avoid using the onnx runtime in Riva.

LL-AI-dev commented 6 months ago

@itzsimpl this is my script for testing different versions of onnx. before the script is run, i do a simple pip install onnx==1.13.0 as you mentioned in [https://github.com/nvidia-riva/nemo2riva/issues/36] but this has seemingly not solved the issue. I also tried with different versions of onnxruntime to no effect.

import nemo
from nemo.collections.tts.models import FastPitchModel, HifiGanModel
import onnx

nemo_ver = nemo.__version__
onnx_ver = onnx.__version__
print("onnxruntime: ",onnx_ver)
print("nemo: ",nemo_ver)

fp_model_name = f"FastPitch_nemo{nemo_ver}_onnx{onnx_ver}"
hg_model_name = f"HifiGan_nemo{nemo_ver}_onnx{onnx_ver}"

model = FastPitchModel.from_pretrained(model_name="tts_en_fastpitch_ipa")

model = HifiGanModel.from_pretrained(model_name="tts_en_hifigan")

print("models saved")
import os

os.system(f"nemo2riva --out /models/riva/{fp_model_name}.riva --key tlt_encode /models/{fp_model_name}.nemo")
os.system(f"nemo2riva --out /models/riva/{hg_model_name}.riva --key tlt_encode /models/{hg_model_name}.nemo")

cmd = f"""
riva-build speech_synthesis --force \
/models/rmir/tts_pipeline_model__nemo{nemo_ver}_onnx{onnx_ver}.rmir:tlt_encode \
/models/riva/{fp_model_name}.riva:tlt_encode \
/models/riva/{hg_model_name}.riva:tlt_encode \
--language_code=en-US \
--num_speakers=1 \
--voice_name test_{onnx_ver} \
--sample_rate 22050 \
--abbreviations_file=/riva_aux/abbr.txt \
--wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far \
--wfst_verbalizer_model=/riva_tn/verbalize.far \
--phone_set=ipa \
--phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt \

My next thought is to test using trt_32 or torchscript via the args in the riva-build command for the encoder (as thats where the error seems to be) or for both enc and dec if needed. [https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#riva-build-optional-parameters]

LL-AI-dev commented 6 months ago

using the flag "use_trt_fp32" during riva-build results in a model that when deployed gets the same error as before about the IR version. UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-Jaz_v1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8

When using the flag "use_torchscript" there is an issue during riva_init.sh due to a missing file. This is not resolved by using --FORMAT TS during the nemo2riva call. infact, setting the format to TS results in the main folder in the output location being empty.

Despite the lack of success before I will go back to trying a combination of downgraded onnx and onnxruntime libraries. As I do not seem to be able to avoid the use of onnx as @itzsimpl suggested.

LL-AI-dev commented 6 months ago

all with nemo 1.23.0. versions of onnx/onnx runtime were set using pip install X commands in the docker file, before the script above is run to download models & save them (I believe saving them in this way is enough to update the model IR version used by onnxruntime. between onnx 1.13.0 and 1.14.0, and onnxruntime 1.16.3, 1.15.1, 1.14.0, 1.13.1 I was not able to find a combination that would allow the creation of a .rmir file that could be successfully deployed.

Tomorrow I will try the following approaches to resolve this:

Click this if you want to see the script I was using (still the same as the one 2 posts up) `import nemo from nemo.collections.tts.models import FastPitchModel, HifiGanModel import onnx import onnxruntime import os #remove old runs os.system("rm -rf /models/models/*") os.system("rm -rf /models/riva/*") os.system("rm -rf /models/rmir/*") #getting current versions nemo_ver = nemo.__version__ onnx_ver = onnx.__version__ onnx_rt_ver = onnxruntime.__version__ print("onnx: ",onnx_ver) print("onnxruntime: ",onnx_rt_ver) print("nemo: ",nemo_ver) fp_model_name = f"FastPitch_nemo{nemo_ver}_onnx{onnx_ver}_onnxrt{onnx_rt_ver}" hg_model_name = f"HifiGan_nemo{nemo_ver}_onnx{onnx_ver}_onnxrt{onnx_rt_ver}" model = FastPitchModel.from_pretrained(model_name="tts_en_fastpitch_ipa") model.save_to(f'/models/{fp_model_name}.nemo') model = HifiGanModel.from_pretrained(model_name="tts_en_hifigan") model.save_to(f'/models/{hg_model_name}.nemo') print("models saved") os.system(f"nemo2riva --out /models/riva/{fp_model_name}.riva --key tlt_encode /models/{fp_model_name}.nemo") os.system(f"nemo2riva --out /models/riva/{hg_model_name}.riva --key tlt_encode /models/{hg_model_name}.nemo") cmd = f""" riva-build speech_synthesis --force \ /models/rmir/tts_pipeline_model_nemo{nemo_ver}_onnx{onnx_ver}_onnxrt{onnx_rt_ver}.rmir:tlt_encode \ /models/riva/{fp_model_name}.riva:tlt_encode \ /models/riva/{hg_model_name}.riva:tlt_encode \ --language_code=en-US \ --num_speakers=1 \ --voice_name test__onnx{onnx_ver}_onnxrt{onnx_rt_ver} \ --sample_rate 22050 \ --abbreviations_file=/riva_aux/abbr.txt \ --wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far \ --wfst_verbalizer_model=/riva_tn/verbalize.far \ --phone_set=ipa \ --phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt \ --upper_case_chars=True """ os.system(cmd)`

onnx 1.14.0, onnxruntime 1.16.3 (defaults from dockerfile)

onnx 1.13.0, onnxruntime 1.16.3

onnx 1.13.0, onnxruntime 1.15.1 -UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-test__onnx1.13.0_onnxrt1.15.1/1/model.onnx failed:Invalid tensor data type 0. (accidentally ran twice, same result)

onnx 1.14.0, onnxruntime 1.15.1 -UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-test__onnx1.14.0_onnxrt1.15.1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8

onnx 1.14.0, onnxruntime 1.14.0 -UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-test__onnx1.14.0_onnxrt1.14.0/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8

onnx 1.14.0, onnxruntime 1.13.1 -UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-test__onnx1.14.0_onnxrt1.13.1/1/model.onnx failed:/workspace/onnxruntime/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 9, max supported IR version: 8

LL-AI-dev commented 6 months ago

Using pre-trained models DIRECTLY from ngc is ok: onnx 1.13.0 onnxruntime 1.15.1 nemo_toolkit 1.20.0 (pip install nemo_toolkit[all]==1.20.0)

wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/nemo/tts_hifigan/1.0.0rc1/files?redirect=true&path=tts_hifigan.nemo' -O tts_hifigan.nemo
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/nemo/tts_en_fastpitch/IPA_1.13.0/files?redirect=true&path=tts_en_fastpitch_align_ipa.nemo' -O tts_en_fastpitch_align_ipa.nemo

nemo2riva --out ../nemo_models/riva/tts_en_fastpitch_align_ipa.riva --key tlt_encode tts_en_fastpitch_align_ipa.nemo
nemo2riva --out ../nemo_models/riva/tts_hifigan.riva --key tlt_encode tts_hifigan.nemo

#below is run within the docker container after it is launched
riva-build speech_synthesis --force \
/models/rmir/tts_pipeline_simple.rmir:tlt_encode \
/models/riva/tts_en_fastpitch_align_ipa.riva:tlt_encode \
/models/riva/tts_hifigan.riva:tlt_encode \
--language_code=en-US \
--num_speakers=1 \
--voice_name simple \
--sample_rate 22050 \
--abbreviations_file=/riva_aux/abbr.txt \
--wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far \
--wfst_verbalizer_model=/riva_tn/verbalize.far \
--phone_set=ipa \
--phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt \

When I ran this combo previously with a higher nemo version, 1.23 (installed via git clone + pip install -e) and saved the models through nemo before doing the conversion, I was getting some error related to the tensor data type. This implies the data type error is resulting from the interaction with nemo.

Rerunning the above code:

Will be doing some testing but it seems that a working combination to deploy nemo models requires:

when i have narrowed down the specific requirements I will post a dockerfile that can be used for training and converting.

Overall, it would be great if the nemo, nemo2riva, and riva packages can be made more compatible

LL-AI-dev commented 6 months ago

After testing many combinations of packages, I have discovered the current compatibility for training and converting .nemo files is limited to:

Training Environment:

Converting Environment:

Until the conflicts are resolved, it will not possible for models or functions introduced in nemo after version 1.20 to be converted to a .riva file

rmittal-github commented 6 months ago

@LL-AI-dev could you please try "pip install nvidia-eff>=0.6.4 nvidia-eff-tao-encryption>=0.1.8" in your python3.10 environment and see whether that helps with this error? Thanks.

LL-AI-dev commented 6 months ago

@rmittal-github thanks for the suggestion. that does allow the nemo2riva (version 2.14) conversion to run, but when deploying the downstream .rmir file, it gets the error: UNAVAILABLE: Internal: onnx runtime error 1: Load model from /data/models/riva-onnx-fastpitch_encoder-test_py310_onnx13/1/model.onnx failed:Invalid tensor data type 0.

I suspect this might be related to a torch version issue (from riva release logs, 2.14 known issues)

When generating .riva models from .nemo using nemo2riva, the nemo:23.08 image is not compatible with Riva due to updated torch version. To avoid any Riva deployment issues, the recommendation is to continue using the last working nemo image

these are the pytorch versions in question:

Do you have any suggestions to resolve this?

LL-AI-dev commented 6 months ago

I tried downgrading nemo2riva to version 2.13 as that had resolved the data type error previously

  • Package version for nemo2riva: 2.13.0 is required.
    • Any higher fails with the error related to “Invalid tensor data type 0”. Even 2.13.1 fails.

downgrading results in the error:

Traceback (most recent call last):
  File "/usr/local/bin/nemo2riva", line 8, in <module>
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cli/nemo2riva.py", line 49, in nemo2riva
  File "/usr/local/lib/python3.10/dist-packages/nemo2riva/convert.py", line 42, in Nemo2Riva
    trainer = Trainer(cfg_trainer)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 399, in __init__
    self._accelerator_connector = _AcceleratorConnector(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 140, in __init__
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 222, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid accelerator name: `accelerator=TrainerConfig(logger=False, callbacks=None, default_root_dir=None, gradient_clip_val=0, num_nodes=1, enable_progress_bar=True, overfit_batches=0.0, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=1, max_epochs=1000, min_epochs=1, max_steps=-1, min_steps=None, limit_train_batches=1.0, limit_val_batches=1.0, limit_test_batches=1.0, val_check_interval=1.0, log_every_n_steps=50, accelerator='cpu', sync_batchnorm=False, precision=32, num_sanity_val_steps=2, profiler=None, benchmark=False, deterministic=False, use_distributed_sampler=True, detect_anomaly=False, plugins=None, limit_predict_batches=1.0, gradient_clip_algorithm='norm', max_time=None, reload_dataloaders_every_n_epochs=0, devices='auto', strategy='auto', enable_checkpointing=False, enable_model_summary=True, inference_mode=True, barebones=False)`. Available names are: auto, cpu, cuda, hpu, ipu, mps, tpu.

i attempt to change the line where the accelerator is set in /usr/local/lib/python3.10/dist-packages/nemo2riva/convert.py but regardless of which value I tried, the conversion failed

rmittal-github commented 5 months ago

@LL-AI-dev I think downgrading to 2.13 is not required. Could you please try with nemo:23.06 image and 2.14.0, along with the pip install packages I mentioned in my last comment, and see if that resolves the issue?

As per the 2.14.0 release notes known issues, nemo:23.08 is not compatible and need to try previous nemo image (23.06)

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.