NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.96k stars 2.49k forks source link

nemo2riva fails on stt_en_conformer_ctc_large_1.6.0 #3966

Closed itzsimpl closed 2 years ago

itzsimpl commented 2 years ago

Start off with nemo:22.01 container, install riva_quickstart_v2.0.0 (riva_api and nemo2riva), download stt_en_conformer_ctc_large_1.6.0 from ngc and unzip it.

Running nemo2riva --out stt_en_conformer_ctc_large.riva stt_en_conformer_ctc_large.nemo gives the following output

INFO: Logging level set to 20                                                                                                                                                                                                                          
INFO: Restoring NeMo model from 'stt_en_conformer_ctc_large.nemo'                                                                                                                                                                                      
################################################################################                                                                                                                                                                       
### WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk                                                                                                                                                                  
###          (please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)                                                                                                                                                                       
###          (or run as: KALDI_ROOT=<your_path> python <your_script>.py)                                                                                                                                                                               
################################################################################                                                                                                                                                                       

[NeMo I 2022-04-11 20:32:55 mixins:165] Tokenizer SentencePieceTokenizer initialized with 128 tokens                                                                                                                                                   
[NeMo W 2022-04-11 20:32:56 modelPT:148] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.                                      
    Train config :                                                                                                                                                                                                                                     
    manifest_filepath: /data/NeMo_ASR_SET/English/v2.0/train/tarred_audio_manifest.json                                                                                                                                                                
    sample_rate: 16000                                                                                                                                                                                                                                 
    batch_size: 32                                                                                                                                                                                                                                     
    shuffle: true                                                                                                                                                                                                                                      
    num_workers: 8                                                                                                                                                                                                                                     
    pin_memory: true                                                                                                                                                                                                                                   
    use_start_end_token: false                                                                                                                                                                                                                         
    trim_silence: false                                                                                                                                                                                                                                
    max_duration: 20.0                                                                                                                                                                                                                                 
    min_duration: 0.1                                                                                                                                                                                                                                  
    shuffle_n: 2048                                                                                                                                                                                                                                    
    is_tarred: true                                                                                                                                                                                                                                    
    tarred_audio_filepaths: /data/NeMo_ASR_SET/English/v2.0/train/audio__OP_0..4095_CL_.tar                                                                                                                                                            

[NeMo W 2022-04-11 20:32:56 modelPT:155] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(
s).                                                                                                                                                                                                                                                    
    Validation config :                                                                                                                                                                                                                                
    manifest_filepath:                                                                                                                                                                                                                                 
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-other.json                                                                                                                                                                      
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-clean.json                                                                                                                                                                      
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-other.json                                                                                                                                                                     
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-clean.json                                                                                                                                                                     
    sample_rate: 16000                                                                                                                                                                                                                                 
    batch_size: 16                                                                                                                                                                                                                                     
    shuffle: false                                                                                                                                                                                                                                     
    num_workers: 8                                                                                                                                                                                                                                     
    pin_memory: true                                                                                                                                                                                                                                   
    use_start_end_token: false                                                                                                                                                                                                                         
    is_tarred: false                                                                                                                                                                                                                                   
    tarred_audio_filepaths: na                                                                                                                                                                                                                         

[NeMo W 2022-04-11 20:32:56 modelPT:161] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).                                               
    Test config :                                                                                                                                                                                                                                      
    manifest_filepath:                                                                                                                                                                                                                                 
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-other.json                                                                                                                                                                     
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-clean.json
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-dev-other.json
    - /data/ASR/LibriSpeech/librispeech_withsp2/manifests/librivox-test-clean.json
    sample_rate: 16000
    batch_size: 16
    shuffle: false
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    is_tarred: false
    tarred_audio_filepaths: na

[NeMo I 2022-04-11 20:32:56 features:259] PADDING: 0
[NeMo I 2022-04-11 20:32:56 features:276] STFT using torch
[NeMo I 2022-04-11 20:32:59 save_restore_connector:158] Model EncDecCTCModelBPE was successfully restored from /data/stt_en_conformer_ctc_large.nemo.
[NeMo I 2022-04-11 20:33:03 export_utils:261] Swapped 108 modules
[NeMo W 2022-04-11 20:33:04 nemo_logging:349] /workspace/nemo/nemo/collections/asr/modules/conformer_encoder.py:240: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      if max_audio_length > self.max_audio_length:

Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
...
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
[W] 'Shape tensor cast elision' routine failed with: None

Running nemo2riva --validate stt_en_conformer_ctc_large.nemo gives the same output, but at the end appends the following lines

WARNING: Logging before flag parsing goes to stderr.
E0411 20:34:31.553215 140467938121536 <frozen eff.validator.validator>:59] Condition for key 'min_nemo_version' (1.3  <built-in function eq> 1.1) is not fulfilled
E0411 20:34:31.553421 140467938121536 schema.py:198] Exported model at stt_en_conformer_ctc_large.riva failed Riva compliance, using schema at /opt/conda/lib/python3.8/site-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml !

Using reinstall.sh to install nemo:1.9.0rc0 makes no difference.

titu1994 commented 2 years ago

The validation step isn't exactly necessary, try to see if export works without it.

itzsimpl commented 2 years ago

I first tried just exporting, since that didn't work I tried also the validation step in hope of getting a little bit more information.

In both cases conversion fails with [W] 'Shape tensor cast elision' routine failed with: None. In addition validation fails with Condition for key 'min_nemo_version' (1.3 <built-in function eq> 1.1) is not fulfilled, regardless the fact that I was using the nemo:22.01 container (hence nemo:1.7.1) or an upgraded version from GitHub (nemo:1.9.0rc0). Checking the manifest of stt_en_conformer_ctc_large_1.6.0 for the presence of nemo_version reveals it is not present. However even if starting off of stt_en_conformer_ctc_large_1.6.0 I create a new model, thus add nemo_version: 1.7.1 to the manifest, the errors are always the same.

FWW. Browsing for Warning: Constant folding - Only steps=1 can be constant leads me to https://github.com/pytorch/pytorch/issues/73843, but I'm clueless in how to test if this is at all related.

titu1994 commented 2 years ago

That is a warning not an error for constant folding. If should bare no impact on the final model export

titu1994 commented 2 years ago

The validation check is mostly not used even by Riva devs, since the check is more strict than necessary.

itzsimpl commented 2 years ago

@titu1994 my bad, sorry for that. It seems that everything works as it should. The conversion is successful, and I can successfully load the model on Riva 2.0.0, it is just that the initial conversion ends with a bit "misleading" warning "... routine failed with: None", which is not present under Riva 1.10.0-beta.