audeering / audonnx

Deploy models in ONNX format
https://audeering.github.io/audonnx/
Other
7 stars 0 forks source link

Allow multiple inputs #74

Open schruefer opened 10 months ago

schruefer commented 10 months ago

Currently audonnx.Model only takes a signal as input. https://github.com/audeering/audonnx/blob/main/audonnx/core/model.py#L198-L209

For some models, however, it might be necessary to provide additional information. E.g. in the case of an LSTM model, we would like to provide the last hidden state and the memory state as additional input.

This is currently impossible as all additional inputs are derived directly from the signal.

It would be great if we could provide an option to enable additional model inputs along side the signal.

hagenw commented 10 months ago

It's also an interesting question from the interface design perspective. E.g. we could add additional keyword args for each input node that does not take the signal as input:

model(signal, sampling_rate, input_1='high')

The problem might be how to map the actual name of the input to the variable name as we cannot use stuff like input-with-fancy-name' as variable name.

Another solution would be that you provide a dictionary instead of the signal, then we can directly use the input names, e.g.

model({'signal': signal, 'state': 0.34}, sampling_rate)

Other ideas?

hagenw commented 10 months ago

Or a combination of the two solutions and we provide a dict for all inputs that don't get the signal:

model(signal, sampling_rate, inputs={'state': 0.34})
schruefer commented 10 months ago

I would be in favor of the second suggestion, with the dictionary as input. If one would use onnxruntime directly, without audonnx one would use:

model_path = audeer.path(model_root, 'model.onnx')
onnx_model = ort.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

output = self.onnx_model.run(['output_1', 'output_2'], {'signal': signal, 'additional_input': additional_input})

Which is already rather close to the dict approach. I also think that this approach is the most future-proof, in the case at some point we don't even necessarily need a signal.

hagenw commented 10 months ago

I'm slightly more in favor of:

model(signal, sampling_rate, inputs={'state': 0.34})

but I see your point that it does not cover the case, where we don't have an input signal.

So maybe we go indeed with

model({'signal': signal, 'state': 0.34}, sampling_rate)

In the implementation we would provide a default for sampling rate, e.g. sampling_rate=None so a user don't have to provide it if no input nodes needs it. I would set to None instead of a default 16000 as then we can raise an error if sampling rate should have been provided but wasn't. For signal we could then support an array as input as we do now, or when a dict is provided it must contain the inputs for each input node. Also this approach has a slight downside: we cannot support different sampling rates for different input nodes, but maybe this is ok?

schruefer commented 10 months ago

I am also not fixed on my suggestion.

I'm slightly more in favor of: model(signal, sampling_rate, inputs={'state': 0.34}) but I see your point that it does not cover the case, where we don't have an input signal.

I guess it's closer to our current solution and as we are focused on audio I also can't think of a case, where we would not start from a signal or something like a spectrogram derived from a signal.

So maybe we go indeed with model({'signal': signal, 'state': 0.34}, sampling_rate) In the implementation we would provide a default for sampling rate, e.g. sampling_rate=None so a user don't have to provide it if no input nodes needs it. I would set to None instead of a default 16000 as then we can raise an error if sampling rate should have been provided but wasn't. For signal we could then support an array as input as we do now, or when a dict is provided it must contain the inputs for each input node.

Sounds great!

Also this approach has a slight downside: we cannot support different sampling rates for different input nodes, but maybe this is ok?

I also can't think of a case, where 2 different signals with 2 different sampling rates would be processed, by the same model.