NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.56k stars 2.42k forks source link

ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1) #252

Closed WeiLi233 closed 3 years ago

WeiLi233 commented 4 years ago

Hello, I tried to trained my own mandarin ASR model with open corpus aishell_1, everything seems right, the config file I used is located in examples/asr/configs/quartznet10x5.yaml, but when I attempted to convert temporary JasperEncoder-STEP-30000.pt and JasperDecoderForCTC-STEP-30000.pt to onnx format by using scripts/export_jasper_to_onnx.py script, An error occured when converting encoder pt file to onnx format, some logs are:

Loading config file... Determining model shape... Num encoder input features: 64 Num decoder input features: 1024 Initializing models... Loading checkpoints... Exporting encoder... 2020-01-07 16:07:16,987 - WARNING - Turned off 115 masked convolutions Module is JasperEncoder. We are removinginput and output length ports since they are not needed for deployment /xxx/anaconda3/lib/python3.7/site-packages/torch/jit/init.py:1007: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error: Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 305, 3] (0.005420095752924681 vs. 0.005409650504589081) and 1 other locations (0.00%) check_tolerance, _force_outplace, True, _module_class) 2020-01-07 16:07:24,303 - ERROR - ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1)

After my own check and trace, I think there may be a bug in nemo.backends.pytorch.actions.py https://github.com/NVIDIA/NeMo/blob/146a51cb685d463c98b0eae4de4d4aefd32ebfb5/nemo/nemo/backends/pytorch/actions.py#L1135-L1136

after I removed "length" from list input_names and removed "encoded_lengths" from list output_names before calling torch.onnx.export, the converting process worked fine.

The nemo version I used is 0.9.0

okuchaiev commented 4 years ago

Were you able to use the onnx files with the workaround you've described. If not, could you please try the latest master or https://github.com/NVIDIA/NeMo/pull/232 ?

Thesane commented 4 years ago

I didn't try the workaround but the master branch still have the same issue above

Thesane commented 4 years ago

by using the inputs/outputs_to_drop, I managed to get it to work by puttin this code after this line https://github.com/NVIDIA/NeMo/blob/146a51cb685d463c98b0eae4de4d4aefd32ebfb5/nemo/nemo/backends/pytorch/actions.py#L1121

for input_to_drop in inputs_to_drop:
    input_names.remove(input_to_drop)
for output_to_drop in outputs_to_drop:
    output_names.remove(output_to_drop)
okuchaiev commented 3 years ago

closing as this is related to the old version