PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
708 stars 73 forks source link

[Music Source Separation] Is it possible to use onnx2tf lite to convert audio based onnx models to tflite models #443

Closed ajithvcoder closed 1 year ago

ajithvcoder commented 1 year ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.7

onnx version number

1.14.0

onnxruntime version number

Not installed

onnxsim (onnx_simplifier) version number

Not installed

tensorflow version number

2.13.0

Download URL for ONNX

https://drive.google.com/file/d/1qK3blM6Z9WFZTU8hRUgotgsRMbFDT5vP/view?usp=sharing

Parameter Replacement JSON

Dont have

Description

  1. Personal Development
  2. I am getting below error
    
    facing below error 
    ValueError: Exception encountered when calling layer "tf.nn.convolution" (type TFOpLambda).

Negative dimension size caused by subtracting 64 from 1 for '{{node tf.nn.convolution/convolution}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 4, 1], use_cudnn_on_gpu=true](tf.nn.convolution/convolution/ExpandDims, tf.nn.convolution/convolution/ExpandDims_1)' with input shapes: [125,1,1,2], [1,64,1,4].

3. I think its because of reason that i am using some model that is not NHWC 
4. I have converted a pretrained Mobilenet subband onnx model from this repo https://github.com/bytedance/music_source_separation#2-parameters-number--speed to onnx model. Now I want to convert from onnx to tflite.
5. Code i used to convert pytorch to onnx model

```python
import numpy as np
import os
import pathlib
import time
import onnx
# import tensorflow as tf
# import onnx_tf
import torch

from bytesep.models.lightning_modules import get_model_class
from bytesep.separator import Separator

def user_defined_build_separator() -> Separator:
    r"""Users could modify this file to load different models.

    Returns:
        separator: Separator
    """

    input_channels = 2
    output_channels = 2
    target_sources_num = 1
    segment_samples = int(44100 * 30.)
    batch_size = 1
    #device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    device = torch.device('cpu')

    model_type = "MobileNet_Subbandtime"

    if model_type == "ResUNet143_Subbandtime":
        checkpoint_path = os.path.join(pathlib.Path.home(), "bytesep_data", 
            "resunet143_subbtandtime_vocals_8.7dB_500k_steps_v2.pth")

    elif model_type == "MobileNet_Subbandtime":
        checkpoint_path = ("./bytesep_data/mobilenet_subbtandtime_vocals_7.2dB_500k_steps_v2.pth")

    # Get model class.
    Model = get_model_class(model_type)

    # Create model.
    model = Model(
        input_channels=input_channels,
        output_channels=output_channels,
        target_sources_num=target_sources_num,
    )

    # Load checkpoint.
    checkpoint = torch.load(checkpoint_path, map_location='cpu')
    model.load_state_dict(checkpoint["model"])

    # Move model to device.
    model.to(device)
    dummy_input =  {'input_dict': {'waveform': torch.ones(1, 2, (44100 * 30)).to(torch.device('cpu'))}}
    # (batch_size, input_channels, segment_samples)
    onnx_model_path = 'mobilenetmss_1.onnx'
    torch.onnx.export(model, dummy_input, onnx_model_path, verbose=True, opset_version=18)
    # Create separator.
    separator = Separator(
        model=model,
        segment_samples=segment_samples,
        batch_size=batch_size,
        device=device,
    )

    return separator

def main():
    r"""An example of using bytesep in your programme. After installing bytesep, 
    users could copy and execute this file in any directory.
    """

    # Build separator.
    separator = user_defined_build_separator()

    # dummy audio
    input_dict = {'waveform': np.zeros((2, 44100 * 60))}

    # Separate.
    separate_time = time.time()
    sep_audio = separator.separate(input_dict)

    print("Done! {:.3f} s".format(time.time() - separate_time))

if __name__ == "__main__":

    main()
PINTO0309 commented 1 year ago

That error can be resolved if you read the README seriously. Note that installation of onnxruntime and onnxsim is required.

Not that it matters, but since TensorFlow includes the OP Col2Im which is difficult to implement, I will implement it if you provide me with ideas for the logic to convert it.

mobilenetmss.onnx.zip

If no one has a good idea I will not implement it.

image

ajithvcoder commented 1 year ago

That error can be resolved if you read the README seriously. Note that installation of onnxruntime and onnxsim is required.

Thanks the error got resolved once i installed onnxsim and onnxruntime . Now i have read the readme completely and seriously , I got another issue like this not able to solve it.

ValueError: Exception encountered when calling layer "tf.nn.batch_normalization" (type TFOpLambda).

Dimensions must be equal, but are 8 and 257 for '{{node tf.nn.batch_normalization/batchnorm/mul_1}} = Mul[T=DT_FLOAT](Placeholder, tf.nn.batch_normalization/batchnorm/mul)' with input shapes: [1,257,3007,8], [257].

Call arguments received by layer "tf.nn.batch_normalization" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 257, 3007, 8), dtype=float32)
  • mean=tf.Tensor(shape=(257,), dtype=float32)
  • variance=tf.Tensor(shape=(257,), dtype=float32)
  • offset=tf.Tensor(shape=(257,), dtype=float32)
  • scale=tf.Tensor(shape=(257,), dtype=float32)
  • variance_epsilon=9.999999747378752e-06
  • name=None
PINTO0309 commented 1 year ago

Is it because of the reason of using NCW format ? I tried adding -kt states_in to the command but it didnt work should i do anything like parameter replacement . I am new to these kind of tasks kindly guide me.

Inside the tool, there is a pattern that makes it difficult to accurately predict the correct position of the axis. Therefore, it has the ability to compensate for the tool's axial transposition movement manually by the user.

https://github.com/PINTO0309/onnx2tf#parameter-replacement

Need to disable the useless transposition OP of NCHW <--> NHWC that confuses the tool.

Also, since it is impossible to determine from the program whether the input tensor of the model is NCHW, NHWC, NCH, NHW, or NXYZ, it is necessary to disable the automatic transposition behavior of the input tensor by using the -kat option when speech is used as input. onnx2tf attempts to force a transposition of NCW to NWC in any model.

As per your request i will go through Col2Im links provided and tell you the logic

Thanks. It should be quite challenging to implement because it requires the use of a large number of symbolic loops. After worrying about it for a few days, the implementation has been suspended for now.

ajithvcoder commented 1 year ago

I have wrote the logic as far as i know the links which are provided were in korean i think even in google trasulate i was not able to convert it. i got few other links where col2im were implemented

https://github.com/BVLC/caffe/blob/master/src/caffe/util/im2col.cpp https://github.com/axinc-ai/ailia-models/blob/eb3f4cbf4796855c1f522a938d94e87f915de526/util/functional/im2col.py#L61

I have wrote the logic for it.

caffe_col2im_logic.txt ailia_col2im_logic.txt

Kindly excuse me if its wrong

PINTO0309 commented 1 year ago

Thanks.

Your suggestion is probably correct. I too have arrived at this information. The problem is that a faithful implementation of this symbolic loop in TensorFlow would produce a very verbose set of operations.

The other biggest problem is that dilations are not implemented.

https://github.com/axinc-ai/ailia-models/blob/eb3f4cbf4796855c1f522a938d94e87f915de526/util/functional/im2col.py#L53-L61

    cols = np.empty((B, C, F_h, F_w, O_h, O_w))
    for h in range(F_h):
        h_lim = h + stride_ud * O_h
        for w in range(F_w):
            w_lim = w + stride_lr * O_w
            cols[:, :, h, w, :, :] = \
                images[:, :, h:h_lim:stride_ud, w:w_lim:stride_lr]

    cols = cols.transpose(1, 2, 3, 0, 4, 5).reshape(C * F_h * F_w, B * O_h * O_w)

https://github.com/onnx/onnx/blob/main/docs/Changelog.md#col2im-18

image

I will try this later today as I am working on other issues, but I think it would probably generate a lot of ScatterND and make for a very inefficient model. :thinking:

ajithvcoder commented 1 year ago

okay got it . thanks :)

PINTO0309 commented 1 year ago

I implemented Col2Im on a trial basis. I don't know if it works correctly in a generic way. Since the implementation is quite redundant, I will wait for suggestions for improvement from other engineers.

I will remove the TODO and NeedHelp labels, so once after 5 days this issue will be auto-closed by the bot.

https://github.com/PINTO0309/onnx2tf/releases/tag/1.18.4

ajithvcoder commented 1 year ago

thanks a lot