NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.14k stars 2.53k forks source link

Unable to export MSDD model to pt or ONNX #10999

Open jingzhaoo opened 3 weeks ago

jingzhaoo commented 3 weeks ago

Describe the bug

EncDecDiarLabelModel inherits ExportableEncDecModel which inherits Exportable. That means it should be exported to pt or ONNX file. When I ran the following code to export it,

model = EncDecDiarLabelModel(cfg=modelConfig)
 msdd_model.export(output="msdd_model.pt", input_example=input_example)

I ran into errors:

AttributeError: 'EncDecDiarLabelModel' object has no attribute 'input_names'

The input_names attribute is defined inExportableat here. Is this an issue related to Python MRO (method resolution order)?

Steps/Code to reproduce bug

I added some lines after here.

            input_example = (input_signal, input_signal_length, emb_vectors, targets)
            msdd_model.msdd._speaker_model.export(output="speaker_model.onnx")
            msdd_model.export(output="msdd_model.pt", input_example=input_example)

Expected behavior

Bothspeaker_model.pyandmsdd_model.pt are generated.

Environment overview (please complete the following information)

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

Additional context

Add any other context about the problem here. GPU model: Nvidia L4

tango4j commented 3 weeks ago

Hi. MSDD is not an end-to-end model that performs speaker diarization from audio to label and it does not support ONNX export. We are less than a month ahead of releasing end-to-end speaker diarizer, so please try using the new model once it gets released.