[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
I've been working with the emotion2vec model and trying to convert it to ONNX format for deployment purposes. The current implementation is great for PyTorch users, but having ONNX support would enable broader deployment options.
I tried converting the model using torch.onnx.export with various approaches:
Direct conversion of the AutoModel
Creating a wrapper around the model components
Implementing custom forward passes
Main challenges encountered:
Dimension mismatches in the conv1d layers
Issues with the masking mechanism
Difficulties preserving the complete model architecture
Problems with tensor handling between components
Could you please provide guidance on the correct architecture for ONNX conversion Including an example of proper tensor dimensionality through the model? I have converted torch vision models to Onnx before, but the audio models seemed a bit complicated to me :/
thank you very much your work it works really nice!
I've been working with the emotion2vec model and trying to convert it to ONNX format for deployment purposes. The current implementation is great for PyTorch users, but having ONNX support would enable broader deployment options.
I tried converting the model using torch.onnx.export with various approaches:
Direct conversion of the AutoModel Creating a wrapper around the model components Implementing custom forward passes
Main challenges encountered:
Dimension mismatches in the conv1d layers Issues with the masking mechanism Difficulties preserving the complete model architecture Problems with tensor handling between components
Could you please provide guidance on the correct architecture for ONNX conversion Including an example of proper tensor dimensionality through the model? I have converted torch vision models to Onnx before, but the audio models seemed a bit complicated to me :/
thank you very much your work it works really nice!
also see: https://github.com/modelscope/FunASR/issues/1690