espnet / espnet_onnx

Onnx wrapper for espnet infrernce model
MIT License
152 stars 24 forks source link

WIP: Add export wrapper classes #73

Closed kenzheng99 closed 1 year ago

kenzheng99 commented 1 year ago

Once https://github.com/espnet/espnet/pull/4928 is merged, we will start being able to export the ESPnet version of encoders directly without requiring the re-implemented ESPnet_onnx versions.

This PR contains my design for a simple EncoderWrapper class that directly uses an ESPnet encoder as self.model. Then, get_encoder should return a generic EncoderWrapper for types of models that are able to be exported directly (currently just Conformer). @Masao-Someki let me know what you think about this design

To test:

# install espnet branch that supports Conformer export natively
pip install -e git+https://github.com/kenzheng99/espnet.git@onnx-export-2#egg=espnet
# a test script we made
python example.py
Masao-Someki commented 1 year ago

@kenzheng99 Thank you, this system design looks very nice! One thing we need to consider is that converting into onnx is not just for inference speed but also portability. So it would be better if we could reduce the number of inputs or outputs the users have to take care of. For example, in the current implementation, we estimate feature_length in the model, so the users do not have to give it to the onnx model.

How about the streaming models, such as the contextual transformer/conformer block model? Do we need to create a wrapper class for them...?

kenzheng99 commented 1 year ago

@Masao-Someki Good point about the feature_length - a counter-point could be that we might want to keep it to fully match the model format of ESPnet directly, so that any encoder exported directly from ESPnet should be compatible with the inference pipeline of this repo. Not sure which priority is more important?

I imagine we would at least need a wrapper for Encoder, Decoder, and any other modules like CTC (actually ctc.py looks fine already and is what I based this design on). I haven't looked into the details of the streaming encoders yet - if they take the same input/output shapes as regular encoders I think they can use the same wrapper, but if necessary I think it's also ok to have a separate wrapper class. I think the main job of these wrapper classes should be just to get the ONNX input/output names, dummy inputs, etc, so the self.model should directly be an ESPnet model.

Masao-Someki commented 1 year ago

@kenzheng99 Sorry for the late reply,

Not sure which priority is more important?

Is it able to set configuration for export? I think we can add an export configuration like calculate_seq_length to include that length calculation in the model so that the user can decide the input.

Though the input and outputs are different in the streaming model, we can use your wrapper class. I think all we need to do is to change the dummy input and input names depending on the model type.

Masao-Someki commented 1 year ago

I will merge this GenericEncoder with #89 and close this PR.