PINTO0309 / PINTO_model_zoo

A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
https://qiita.com/PINTO
MIT License
3.5k stars 566 forks source link

Try to reverse 'hand_landmark.tflite' model in pytorch #168

Closed TaoZappar closed 2 years ago

TaoZappar commented 2 years ago

Issue Type

Others

OS

Ubuntu

OS architecture

x86_64

Programming Language

Python

Framework

PyTorch

Description

Dear author

Thanks for your delicated efforts. Recently, I was trying to reverse the ''hand_landmark.tflite'' model in pytorch according its graph in netron. However, the size of my reversed model is way much larger than the official one and the one in 033_Hand_Detection_and_Tracking even though I convert my model to .onnx format, which still have 220MB comparing to 4.1MB of model_float32.onnx. There must be something wrong in my reverse, so I was wondering if u could help me look into that. Thanks in advance. My reversing code is attached in the end. Also a graph of two different models is shown in the picture, the left is model_float32.onnx, the right is my reversed model. Notice the size difference of the conv between the two clips.

Screenshot from 2021-12-19 08-58-20

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class PixelwiseConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(PixelwiseConv, self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                  kernel_size=1, stride=1, padding=0, bias=True)
        self.bn = nn.BatchNorm2d(out_channels)
        # self.act = nn.LeakyReLU(0.2)
        self.act = nn.ReLU6(inplace=True)
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.act(x)
        return x

class DepthwiseConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1):
        super(DepthwiseConv, self).__init__()
        padding = (kernel_size - 1) // 2
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                  kernel_size=kernel_size, stride=stride, padding=padding, bias=True)
        self.bn = nn.BatchNorm2d(out_channels)
        # self.act = nn.LeakyReLU(0.2)
        self.act = nn.ReLU6(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.act(x)
        return x

class ConvBlock(nn.Module):
    def __init__(self, in_channels, mid_channels, out_channels, in_kernel_size, mid_kernel_size,
                 in_stride, mid_stride):
        super(ConvBlock, self).__init__()
        self.layer_1 = DepthwiseConv(in_channels, mid_channels, in_kernel_size, in_stride)
        self.layer_2 = DepthwiseConv(mid_channels, mid_channels, mid_kernel_size, mid_stride)
        self.layer_3 = nn.Conv2d(in_channels=mid_channels, out_channels=out_channels,
                  kernel_size=1, stride=1, padding=0, bias=True)
        self.bn = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        x1 = self.layer_1(x)
        x2 = self.layer_2(x1)
        x3 = self.layer_3(x2)
        x4 = self.bn(x3)
        return x4
class HandLandmarks(nn.Module):

    def __init__(self):
        super(HandLandmarks, self).__init__()
        self._define_layers()

    def _define_layers(self):
        self.block_1 = ConvBlock(in_channels=1, mid_channels=24, out_channels=16, in_kernel_size=3, mid_kernel_size=3, in_stride=2, mid_stride=1)
        self.block_2 = ConvBlock(in_channels=16, mid_channels=64, out_channels=16, in_kernel_size=1, mid_kernel_size=3, in_stride=1, mid_stride=2)
        self.block_3 = ConvBlock(in_channels=16, mid_channels=96, out_channels=16, in_kernel_size=1, mid_kernel_size=3, in_stride=1, mid_stride=1)
        self.block_4 = ConvBlock(in_channels=16, mid_channels=96, out_channels=24, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=2)
        self.block_5 = ConvBlock(in_channels=24, mid_channels=144, out_channels=24, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_6 = ConvBlock(in_channels=24, mid_channels=144, out_channels=48, in_kernel_size=1, mid_kernel_size=3, in_stride=1, mid_stride=2)
        self.block_7 = ConvBlock(in_channels=48, mid_channels=288, out_channels=48, in_kernel_size=1, mid_kernel_size=3, in_stride=1, mid_stride=1)
        self.block_8 = ConvBlock(in_channels=48, mid_channels=288, out_channels=48, in_kernel_size=1, mid_kernel_size=3, in_stride=1, mid_stride=1)
        self.block_9 = ConvBlock(in_channels=48, mid_channels=288, out_channels=64, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_10 = ConvBlock(in_channels=64, mid_channels=384, out_channels=64, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_11 = ConvBlock(in_channels=64, mid_channels=384, out_channels=64, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_12 = ConvBlock(in_channels=64, mid_channels=384, out_channels=112, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=2)
        self.block_13 = ConvBlock(in_channels=112, mid_channels=672, out_channels=112, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_14 = ConvBlock(in_channels=112, mid_channels=672, out_channels=112, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)
        self.block_15 = ConvBlock(in_channels=112, mid_channels=672, out_channels=112, in_kernel_size=1, mid_kernel_size=5, in_stride=1, mid_stride=1)

        self.layer_16 = DepthwiseConv(112, 672, 1, 1)
        self.layer_17 = DepthwiseConv(672, 672, 3, 1)

        self.avg_pool = nn.AvgPool2d(7)

        self.fc_1 = nn.Linear(672, 42)
        self.fc_2 = nn.Linear(672, 1)

        self.fc_3 = nn.Linear(672, 1)

        # self.fc_4 = nn.Linear(672, 63) # tj : could be the 3d world coords of the 21 landmarks

        self.max_pool = nn.MaxPool2d(kernel_size=2, stride=2)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):

        x1 = self.block_1(x) # tj : torch.Size([1, 16, 112, 112])

        x2_1 = self.max_pool(x1) # torch.Size([1, 16, 56, 56])
        x2_2 = self.block_2(x1) #  torch.Size([1, 16, 56, 56])

        x2_sum = x2_1 + x2_2

        x3 = self.block_3(x2_sum) # torch.Size([1, 16, 56, 56])
        x3_sum = x2_sum + x3

        x4 = self.block_4(x3_sum) # torch.Size([1, 24, 28, 28])

        x5 = self.block_5(x4)
        x5_sum = x4 + x5 # torch.Size([1, 24, 28, 28])

        x6 = self.block_6(x5_sum) # torch.Size([1, 48, 14, 14])

        x7 = self.block_7(x6)
        x7_sum = x6 + x7 # torch.Size([1, 48, 14, 14])

        x8 = self.block_8(x7_sum)
        x8_sum = x7_sum + x8 # torch.Size([1, 48, 14, 14])

        x9 = self.block_9(x8_sum) # torch.Size([1, 64, 14, 14])

        x10 = self.block_10(x9)
        x10_sum = x9 + x10 # torch.Size([1, 64, 14, 14])

        x11 = self.block_11(x10_sum)
        x11_sum = x10_sum + x11 # torch.Size([1, 64, 14, 14])

        x12 = self.block_12(x11_sum) # torch.Size([1, 112, 7, 7])

        x13 = self.block_13(x12)
        x13_sum = x12 + x13 # torch.Size([1, 112, 7, 7])

        x14 = self.block_14(x13_sum)
        x14_sum = x13_sum + x14 # torch.Size([1, 112, 7, 7])

        x15 = self.block_15(x14_sum)
        x15_sum = x14_sum + x15 # torch.Size([1, 112, 7, 7])

        x16 = self.layer_16(x15_sum) # torch.Size([1, 672, 7, 7])
        x17 = self.layer_17(x16) # torch.Size([1, 672, 7, 7])

        x18 = self.avg_pool(x17) # torch.Size([1, 672, 1, 1])
        x19 = torch.flatten(x18, 1) # torch.Size([1, 672])

        landmarks = self.fc_1(x19) # torch.Size([1, 42])
        handness = self.fc_2(x19) # torch.Size([1, 1])
        handness_ = self.sigmoid(handness)
        score = self.fc_3(x19) # torch.Size([1, 1])
        score_ = self.sigmoid(score)

        return landmarks, handness_, score_

if __name__ == "__main__":

    model = HandLandmarks()
    inp = torch.rand(1, 3, 224, 224)
    landmarks, handness, score = model(inp)

    print(model)

    torch.onnx.export(model, inp, '../models/handlandmarks.onnx')

    pass

Relevant Log Output

No response

URL or source code for simple inference testing code

No response

PINTO0309 commented 2 years ago

Before I can help you investigate, your code will be in error.

Traceback (most recent call last):
  File "hand.py", line 169, in <module>
    landmarks, handness, score = model(inp)
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "hand.py", line 98, in forward
    x1 = self.block_1(x) # tj : torch.Size([1, 16, 112, 112])
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "hand.py", line 50, in forward
    x1 = self.layer_1(x)
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "hand.py", line 33, in forward
    x = self.conv(x)
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/xxxxx/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [24, 1, 3, 3], expected input[1, 3, 224, 224] to have 1 channels, but got 3 channels instead
PINTO0309 commented 2 years ago

I replied to you within five minutes of your post. I'm going to close it since you don't seem to be interested in doing so.