NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.52k stars 3.22k forks source link

[GPUNet/PyTorch] Model padding is incorrect, layer naming goes against norms. #1144

Open rwightman opened 2 years ago

rwightman commented 2 years ago

Describe the bug The handling of padding in the model appears incorrect due to misunderstanding of the source code (my pytorch-image-models) EfficientNet impl, and 'same' padding for TF compat.

First of all, 'same' padding should not be used if the model is native PyTorch, it's intended for compatibility with Tensorflow weights. But, it appears the EdgeResidual (FusedMBConv layers in GPUNet are using it, but incorrectly). You force 'same' padding here: https://github.com/NVIDIA/DeepLearningExamples/blob/4a15e9146a6516941ba3ae146621a5c94e4bc431/PyTorch/Classification/GPUNet/models/gpunet_builder.py#L277-L288 but I cannot tell why that would be desired.. this is problematic for the reasons below.

The code in question was copied from my impl so I'm very famliar (https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/padding.py and https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/conv2d_same.py). However it's been modified in a way that makes no sense to me

Yours: https://github.com/NVIDIA/DeepLearningExamples/blob/4a15e9146a6516941ba3ae146621a5c94e4bc431/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L78-L82

Mine https://github.com/rwightman/pytorch-image-models/blob/e4360e6125bb0bb4279785810c8eb33b40af3ebd/timm/models/layers/conv2d_same.py#L31-L41

def create_conv2d_pad(in_chs, out_chs, kernel_size, **kwargs):
    padding = kwargs.pop('padding', '')
    kwargs.setdefault('bias', False)
    padding, is_dynamic = get_padding_value(padding, kernel_size, **kwargs)
    if is_dynamic:
        return Conv2dSame(in_chs, out_chs, kernel_size, **kwargs)
    else:
        return nn.Conv2d(in_chs, out_chs, kernel_size, padding=padding, **kwargs)

With the modifications you've made here, if dynamic padding is needed to achieve the equivalent of 'same' (which is the case for edge residual), your impl will just use a conv2d layer with no padding. I doubt that was the intent.

Additionally, I'd re-think the builder strategy you use here, it ends up with a model where modules and parameters are completely out of order wrt to their position in the model (ie not stem, stages, head)... and the naming is at odds with convention for pretty much all other convenets

you use 'head' for the 'stem' (head is is supposed to be the classifier) https://github.com/NVIDIA/DeepLearningExamples/blob/4a15e9146a6516941ba3ae146621a5c94e4bc431/PyTorch/Classification/GPUNet/models/gpunet_builder.py#L233-L239
the actual head (classifier and pre-classifier FC) is named as a layer

EDIT: I tested this padding difference as I did a quick checkpoint port -> timm's EfficientNet model and it has a noteworthy (negative impact) on accuracy, I do not intend to special case any code for these checkpoints. One other special case here, the Epilogue (head) uses a ReLU always, while the Stem (your head) uses SiLU. My models and the TF models tend to keep the activation of the stem and head the same and only switch the various stages per-model variant. Something to consider if you end up retraining.

All in, I feel this will limit the use of an otherwise interesting exploration for finding EfficientNetV2-like models optimized for GPU use.

And finally, when you use someone elses Apache 2.0 code, you are not supposed to remove existing copyrights (but add your own in addition) compare the files below and please ssee b and c of the Apache license section 4 pasted below... https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L1-L22 vs https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet_blocks.py (and the other files from timm above)

4. Redistribution

You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:

(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
(d) If the Work includes a “NOTICE” text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

To Reproduce Steps to reproduce the behavior:

Install '...'
Set "..."
Launch '...'

Expected behavior A clear and concise description of what you expected to happen.

Environment Please provide at least:

Container version (e.g. pytorch:19.05-py3):
GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB):
CUDA driver version (e.g. 418.67):

linnanwang commented 2 years ago

Hello @rwightman ,

Thank you so much for submitting this bug report; we're working on fixing the license issue and sincerely apologize for not doing it right in the first place. We already have an MR to fix the license issues.

As for padding issues, thanks so much. Please give me some time to understand the issue; then we will fix it ASAP. Thank you so much for helping us make the thing right.

rwightman commented 2 years ago

@linnanwang thanks for updating the copyright/license info

The issue with padding is that the 3x3 convs in the EdgeResidual (FusedMBConv) layers have 0 padding, for typical PyTorch use they should have (padding = ((stride - 1) + dilation * (kernel_size - 1)) // 2, simplifies to k//2 for non-dilated use here) which would be the case if the pad_type arg was left as default ''

This conv is the problem one: https://github.com/NVIDIA/DeepLearningExamples/blob/57422942f0a84110fa1803ff39527e889e0d05f5/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L519-L528

However, for the EdgeResidual specifically, 'same' is being passed into the padding argument for some reason in your builder, which is intended to provide PyTorch compatibility with Tensorflow 'SAME' padding. However, the implementation of same padding wasn modified so that it ends up using no padding.

You've implemented a different variant of this module (not used by the current model impl) that shows what a typical padding would be: https://github.com/NVIDIA/DeepLearningExamples/blob/57422942f0a84110fa1803ff39527e889e0d05f5/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L788-L798

I would recommend removing all timm same padding code as its sole purpose was to provide compatibility with the weights the Google researchers trained. As it stands in your implementation, with Conv2dSame removed, it ends up being incorrect for all use cases (it won't work with google weights and is not padding properly for PyTorch use.

rwightman commented 2 years ago

In terms of the naming / 'modelling interface', when dealing with models across frameworks, exporting, deploying, etc it's helpful to follow some norms in naming and these ones are quite 'unusual', see comparison for 0.65ms below, modules/params end up in same order, but the stem / head naming is quite confusing

GPUNet

The model via print (repr):

GPUNet(
  (network): Sequential(
    (head: 2): prologue_i3_o32_s2_swish   # this is the 'stem' in typical naming convention
    (stage: 1 layer3): conv_k3_i32_o32_s1_relu
    (stage: 1 layer4): conv_k3_i32_o32_s1_relu
    (stage: 2 layer5): er_k3_e5_i32_o32_s2_relu_se_False
    (stage: 2 layer6): er_k3_e5_i32_o32_s1_relu_se_False
    (stage: 3 layer7): er_k3_e5_i32_o64_s2_relu_se_False
    (stage: 3 layer8): er_k3_e5_i64_o64_s1_relu_se_False
    (stage: 3 layer9): er_k3_e5_i64_o64_s1_relu_se_False
    (stage: 4 layer10): irb_k3_e5_i64_o256_s2_swish_se_False
    (stage: 4 layer11): irb_k3_e5_i256_o256_s1_swish_se_False
    (stage: 4 layer12): irb_k3_e5_i256_o256_s1_swish_se_False
    (stage: 6 layer13): irb_k3_e5_i256_o704_s2_relu_se_True
    (stage: 6 layer14): irb_k3_e5_i704_o704_s1_relu_se_True
    ( layer15): epilogue_i704_o1280_s1_relu   # this is the classification 'head' in typical naming convention
  )
)

Iterate over the model params:

network.head: 2.net.0.weight torch.Size([32, 3, 3, 3])
network.head: 2.net.1.weight torch.Size([32])
network.head: 2.net.1.bias torch.Size([32])
network.stage: 1 layer3.conv.weight torch.Size([32, 32, 3, 3])
network.stage: 1 layer3.bn1.weight torch.Size([32])
network.stage: 1 layer3.bn1.bias torch.Size([32])
network.stage: 1 layer4.conv.weight torch.Size([32, 32, 3, 3])
network.stage: 1 layer4.bn1.weight torch.Size([32])
network.stage: 1 layer4.bn1.bias torch.Size([32])
network.stage: 2 layer5.conv_exp.weight torch.Size([160, 32, 3, 3])
network.stage: 2 layer5.bn1.weight torch.Size([160])
network.stage: 2 layer5.bn1.bias torch.Size([160])
network.stage: 2 layer5.conv_pwl.weight torch.Size([32, 160, 1, 1])
network.stage: 2 layer5.bn2.weight torch.Size([32])
network.stage: 2 layer5.bn2.bias torch.Size([32])
network.stage: 2 layer6.conv_exp.weight torch.Size([160, 32, 3, 3])
network.stage: 2 layer6.bn1.weight torch.Size([160])
network.stage: 2 layer6.bn1.bias torch.Size([160])
network.stage: 2 layer6.conv_pwl.weight torch.Size([32, 160, 1, 1])
network.stage: 2 layer6.bn2.weight torch.Size([32])
network.stage: 2 layer6.bn2.bias torch.Size([32])
network.stage: 3 layer7.conv_exp.weight torch.Size([160, 32, 3, 3])
network.stage: 3 layer7.bn1.weight torch.Size([160])
network.stage: 3 layer7.bn1.bias torch.Size([160])
network.stage: 3 layer7.conv_pwl.weight torch.Size([64, 160, 1, 1])
network.stage: 3 layer7.bn2.weight torch.Size([64])
network.stage: 3 layer7.bn2.bias torch.Size([64])
network.stage: 3 layer8.conv_exp.weight torch.Size([320, 64, 3, 3])
network.stage: 3 layer8.bn1.weight torch.Size([320])
network.stage: 3 layer8.bn1.bias torch.Size([320])
network.stage: 3 layer8.conv_pwl.weight torch.Size([64, 320, 1, 1])
network.stage: 3 layer8.bn2.weight torch.Size([64])
network.stage: 3 layer8.bn2.bias torch.Size([64])
network.stage: 3 layer9.conv_exp.weight torch.Size([320, 64, 3, 3])
network.stage: 3 layer9.bn1.weight torch.Size([320])
network.stage: 3 layer9.bn1.bias torch.Size([320])
network.stage: 3 layer9.conv_pwl.weight torch.Size([64, 320, 1, 1])
network.stage: 3 layer9.bn2.weight torch.Size([64])
network.stage: 3 layer9.bn2.bias torch.Size([64])
network.stage: 4 layer10.conv_pw.weight torch.Size([320, 64, 1, 1])
network.stage: 4 layer10.bn1.weight torch.Size([320])
network.stage: 4 layer10.bn1.bias torch.Size([320])
network.stage: 4 layer10.conv_dw.weight torch.Size([320, 1, 3, 3])
network.stage: 4 layer10.bn2.weight torch.Size([320])
network.stage: 4 layer10.bn2.bias torch.Size([320])
network.stage: 4 layer10.conv_pwl.weight torch.Size([256, 320, 1, 1])
network.stage: 4 layer10.bn3.weight torch.Size([256])
network.stage: 4 layer10.bn3.bias torch.Size([256])
network.stage: 4 layer11.conv_pw.weight torch.Size([1280, 256, 1, 1])
network.stage: 4 layer11.bn1.weight torch.Size([1280])
network.stage: 4 layer11.bn1.bias torch.Size([1280])
network.stage: 4 layer11.conv_dw.weight torch.Size([1280, 1, 3, 3])
network.stage: 4 layer11.bn2.weight torch.Size([1280])
network.stage: 4 layer11.bn2.bias torch.Size([1280])
network.stage: 4 layer11.conv_pwl.weight torch.Size([256, 1280, 1, 1])
network.stage: 4 layer11.bn3.weight torch.Size([256])
network.stage: 4 layer11.bn3.bias torch.Size([256])
network.stage: 4 layer12.conv_pw.weight torch.Size([1280, 256, 1, 1])
network.stage: 4 layer12.bn1.weight torch.Size([1280])
network.stage: 4 layer12.bn1.bias torch.Size([1280])
network.stage: 4 layer12.conv_dw.weight torch.Size([1280, 1, 3, 3])
network.stage: 4 layer12.bn2.weight torch.Size([1280])
network.stage: 4 layer12.bn2.bias torch.Size([1280])
network.stage: 4 layer12.conv_pwl.weight torch.Size([256, 1280, 1, 1])
network.stage: 4 layer12.bn3.weight torch.Size([256])
network.stage: 4 layer12.bn3.bias torch.Size([256])
network.stage: 6 layer13.conv_pw.weight torch.Size([1280, 256, 1, 1])
network.stage: 6 layer13.bn1.weight torch.Size([1280])
network.stage: 6 layer13.bn1.bias torch.Size([1280])
network.stage: 6 layer13.conv_dw.weight torch.Size([1280, 1, 3, 3])
network.stage: 6 layer13.bn2.weight torch.Size([1280])
network.stage: 6 layer13.bn2.bias torch.Size([1280])
network.stage: 6 layer13.se.conv_reduce.weight torch.Size([64, 1280, 1, 1])
network.stage: 6 layer13.se.conv_reduce.bias torch.Size([64])
network.stage: 6 layer13.se.conv_expand.weight torch.Size([1280, 64, 1, 1])
network.stage: 6 layer13.se.conv_expand.bias torch.Size([1280])
network.stage: 6 layer13.conv_pwl.weight torch.Size([704, 1280, 1, 1])
network.stage: 6 layer13.bn3.weight torch.Size([704])
network.stage: 6 layer13.bn3.bias torch.Size([704])
network.stage: 6 layer14.conv_pw.weight torch.Size([3520, 704, 1, 1])
network.stage: 6 layer14.bn1.weight torch.Size([3520])
network.stage: 6 layer14.bn1.bias torch.Size([3520])
network.stage: 6 layer14.conv_dw.weight torch.Size([3520, 1, 3, 3])
network.stage: 6 layer14.bn2.weight torch.Size([3520])
network.stage: 6 layer14.bn2.bias torch.Size([3520])
network.stage: 6 layer14.se.conv_reduce.weight torch.Size([176, 3520, 1, 1])
network.stage: 6 layer14.se.conv_reduce.bias torch.Size([176])
network.stage: 6 layer14.se.conv_expand.weight torch.Size([3520, 176, 1, 1])
network.stage: 6 layer14.se.conv_expand.bias torch.Size([3520])
network.stage: 6 layer14.conv_pwl.weight torch.Size([704, 3520, 1, 1])
network.stage: 6 layer14.bn3.weight torch.Size([704])
network.stage: 6 layer14.bn3.bias torch.Size([704])
network. layer15.net.0.weight torch.Size([1280, 704, 1, 1])
network. layer15.net.1.weight torch.Size([1280])
network. layer15.net.1.bias torch.Size([1280])
network. layer15.net.6.weight torch.Size([1000, 1280])
network. layer15.net.6.bias torch.Size([1000])

timm EfficientNet

Module print:

EfficientNet(
  (conv_stem): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNormAct2d(
    32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
    (drop): Identity()
    (act): SiLU(inplace=True)
  )
  (blocks): Sequential(
    (0): Sequential(
      (0): ConvBnAct(
        (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (drop_path): Identity()
      )
      (1): ConvBnAct(
        (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (drop_path): Identity()
      )
    )
    (1): Sequential(
      (0): EdgeResidual(
        (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn1): BatchNormAct2d(
          160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(160, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (1): EdgeResidual(
        (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(160, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
    )
    (2): Sequential(
      (0): EdgeResidual(
        (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (bn1): BatchNormAct2d(
          160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(160, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (1): EdgeResidual(
        (conv_exp): Conv2d(64, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(320, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (2): EdgeResidual(
        (conv_exp): Conv2d(64, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(320, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn2): BatchNormAct2d(
          64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
    )
    (3): Sequential(
      (0): InvertedResidual(
        (conv_pw): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (conv_dw): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=320, bias=False)
        (bn2): BatchNormAct2d(
          320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(320, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNormAct2d(
          256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (1): InvertedResidual(
        (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280, bias=False)
        (bn2): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNormAct2d(
          256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (2): InvertedResidual(
        (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280, bias=False)
        (bn2): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): SiLU(inplace=True)
        )
        (se): Identity()
        (conv_pwl): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNormAct2d(
          256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
    )
    (4): Sequential(
      (0): InvertedResidual(
        (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1280, bias=False)
        (bn2): BatchNormAct2d(
          1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): SqueezeExcite(
          (conv_reduce): Conv2d(1280, 64, kernel_size=(1, 1), stride=(1, 1))
          (act1): ReLU(inplace=True)
          (conv_expand): Conv2d(64, 1280, kernel_size=(1, 1), stride=(1, 1))
          (gate): Sigmoid()
        )
        (conv_pwl): Conv2d(1280, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNormAct2d(
          704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
      (1): InvertedResidual(
        (conv_pw): Conv2d(704, 3520, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNormAct2d(
          3520, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (conv_dw): Conv2d(3520, 3520, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=3520, bias=False)
        (bn2): BatchNormAct2d(
          3520, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (se): SqueezeExcite(
          (conv_reduce): Conv2d(3520, 176, kernel_size=(1, 1), stride=(1, 1))
          (act1): ReLU(inplace=True)
          (conv_expand): Conv2d(176, 3520, kernel_size=(1, 1), stride=(1, 1))
          (gate): Sigmoid()
        )
        (conv_pwl): Conv2d(3520, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNormAct2d(
          704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): Identity()
        )
        (drop_path): Identity()
      )
    )
  )
  (conv_head): Conv2d(704, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (bn2): BatchNormAct2d(
    1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
    (drop): Identity()
    (act): ReLU(inplace=True)
  )
  (global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
  (classifier): Linear(in_features=1280, out_features=1000, bias=True)
)

conv_stem.weight torch.Size([32, 3, 3, 3])
bn1.weight torch.Size([32])
bn1.bias torch.Size([32])
blocks.0.0.conv.weight torch.Size([32, 32, 3, 3])
blocks.0.0.bn1.weight torch.Size([32])
blocks.0.0.bn1.bias torch.Size([32])
blocks.0.1.conv.weight torch.Size([32, 32, 3, 3])
blocks.0.1.bn1.weight torch.Size([32])
blocks.0.1.bn1.bias torch.Size([32])
blocks.1.0.conv_exp.weight torch.Size([160, 32, 3, 3])
blocks.1.0.bn1.weight torch.Size([160])
blocks.1.0.bn1.bias torch.Size([160])
blocks.1.0.conv_pwl.weight torch.Size([32, 160, 1, 1])
blocks.1.0.bn2.weight torch.Size([32])
blocks.1.0.bn2.bias torch.Size([32])
blocks.1.1.conv_exp.weight torch.Size([160, 32, 3, 3])
blocks.1.1.bn1.weight torch.Size([160])
blocks.1.1.bn1.bias torch.Size([160])
blocks.1.1.conv_pwl.weight torch.Size([32, 160, 1, 1])
blocks.1.1.bn2.weight torch.Size([32])
blocks.1.1.bn2.bias torch.Size([32])
blocks.2.0.conv_exp.weight torch.Size([160, 32, 3, 3])
blocks.2.0.bn1.weight torch.Size([160])
blocks.2.0.bn1.bias torch.Size([160])
blocks.2.0.conv_pwl.weight torch.Size([64, 160, 1, 1])
blocks.2.0.bn2.weight torch.Size([64])
blocks.2.0.bn2.bias torch.Size([64])
blocks.2.1.conv_exp.weight torch.Size([320, 64, 3, 3])
blocks.2.1.bn1.weight torch.Size([320])
blocks.2.1.bn1.bias torch.Size([320])
blocks.2.1.conv_pwl.weight torch.Size([64, 320, 1, 1])
blocks.2.1.bn2.weight torch.Size([64])
blocks.2.1.bn2.bias torch.Size([64])
blocks.2.2.conv_exp.weight torch.Size([320, 64, 3, 3])
blocks.2.2.bn1.weight torch.Size([320])
blocks.2.2.bn1.bias torch.Size([320])
blocks.2.2.conv_pwl.weight torch.Size([64, 320, 1, 1])
blocks.2.2.bn2.weight torch.Size([64])
blocks.2.2.bn2.bias torch.Size([64])
blocks.3.0.conv_pw.weight torch.Size([320, 64, 1, 1])
blocks.3.0.bn1.weight torch.Size([320])
blocks.3.0.bn1.bias torch.Size([320])
blocks.3.0.conv_dw.weight torch.Size([320, 1, 3, 3])
blocks.3.0.bn2.weight torch.Size([320])
blocks.3.0.bn2.bias torch.Size([320])
blocks.3.0.conv_pwl.weight torch.Size([256, 320, 1, 1])
blocks.3.0.bn3.weight torch.Size([256])
blocks.3.0.bn3.bias torch.Size([256])
blocks.3.1.conv_pw.weight torch.Size([1280, 256, 1, 1])
blocks.3.1.bn1.weight torch.Size([1280])
blocks.3.1.bn1.bias torch.Size([1280])
blocks.3.1.conv_dw.weight torch.Size([1280, 1, 3, 3])
blocks.3.1.bn2.weight torch.Size([1280])
blocks.3.1.bn2.bias torch.Size([1280])
blocks.3.1.conv_pwl.weight torch.Size([256, 1280, 1, 1])
blocks.3.1.bn3.weight torch.Size([256])
blocks.3.1.bn3.bias torch.Size([256])
blocks.3.2.conv_pw.weight torch.Size([1280, 256, 1, 1])
blocks.3.2.bn1.weight torch.Size([1280])
blocks.3.2.bn1.bias torch.Size([1280])
blocks.3.2.conv_dw.weight torch.Size([1280, 1, 3, 3])
blocks.3.2.bn2.weight torch.Size([1280])
blocks.3.2.bn2.bias torch.Size([1280])
blocks.3.2.conv_pwl.weight torch.Size([256, 1280, 1, 1])
blocks.3.2.bn3.weight torch.Size([256])
blocks.3.2.bn3.bias torch.Size([256])
blocks.4.0.conv_pw.weight torch.Size([1280, 256, 1, 1])
blocks.4.0.bn1.weight torch.Size([1280])
blocks.4.0.bn1.bias torch.Size([1280])
blocks.4.0.conv_dw.weight torch.Size([1280, 1, 3, 3])
blocks.4.0.bn2.weight torch.Size([1280])
blocks.4.0.bn2.bias torch.Size([1280])
blocks.4.0.se.conv_reduce.weight torch.Size([64, 1280, 1, 1])
blocks.4.0.se.conv_reduce.bias torch.Size([64])
blocks.4.0.se.conv_expand.weight torch.Size([1280, 64, 1, 1])
blocks.4.0.se.conv_expand.bias torch.Size([1280])
blocks.4.0.conv_pwl.weight torch.Size([704, 1280, 1, 1])
blocks.4.0.bn3.weight torch.Size([704])
blocks.4.0.bn3.bias torch.Size([704])
blocks.4.1.conv_pw.weight torch.Size([3520, 704, 1, 1])
blocks.4.1.bn1.weight torch.Size([3520])
blocks.4.1.bn1.bias torch.Size([3520])
blocks.4.1.conv_dw.weight torch.Size([3520, 1, 3, 3])
blocks.4.1.bn2.weight torch.Size([3520])
blocks.4.1.bn2.bias torch.Size([3520])
blocks.4.1.se.conv_reduce.weight torch.Size([176, 3520, 1, 1])
blocks.4.1.se.conv_reduce.bias torch.Size([176])
blocks.4.1.se.conv_expand.weight torch.Size([3520, 176, 1, 1])
blocks.4.1.se.conv_expand.bias torch.Size([3520])
blocks.4.1.conv_pwl.weight torch.Size([704, 3520, 1, 1])
blocks.4.1.bn3.weight torch.Size([704])
blocks.4.1.bn3.bias torch.Size([704])
conv_head.weight torch.Size([1280, 704, 1, 1])
bn2.weight torch.Size([1280])
bn2.bias torch.Size([1280])
classifier.weight torch.Size([1000, 1280])
classifier.bias torch.Size([1000])

linnanwang commented 2 years ago

@linnanwang thanks for updating the copyright/license info

The issue with padding is that the 3x3 convs in the EdgeResidual (FusedMBConv) layers have 0 padding, for typical PyTorch use they should have (padding = ((stride - 1) + dilation * (kernel_size - 1)) // 2, simplifies to k//2 for non-dilated use here) which would be the case if the pad_type arg was left as default ''

This conv is the problem one:

https://github.com/NVIDIA/DeepLearningExamples/blob/57422942f0a84110fa1803ff39527e889e0d05f5/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L519-L528

However, for the EdgeResidual specifically, 'same' is being passed into the padding argument for some reason in your builder, which is intended to provide PyTorch compatibility with Tensorflow 'SAME' padding. However, the implementation of same padding wasn modified so that it ends up using no padding.

You've implemented a different variant of this module (not used by the current model impl) that shows what a typical padding would be:

https://github.com/NVIDIA/DeepLearningExamples/blob/57422942f0a84110fa1803ff39527e889e0d05f5/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L788-L798

I would recommend removing all timm same padding code as its sole purpose was to provide compatibility with the weights the Google researchers trained. As it stands in your implementation, with Conv2dSame removed, it ends up being incorrect for all use cases (it won't work with google weights and is not padding properly for PyTorch use.

Hello @rwightman ,

Thank you so much for this very nice explanation. There was an issue for the edge residual to work on TensorRT, so I made a change on the padding. I will change the codes to use the pytorch native 'same' padding in conv2d since we don't need to be compatible with Google's code ;) Great catch, and thank you.

linnanwang commented 2 years ago

@rwightman

Also I really appreciate your advices in the naming convention. Yes, our stem and head is opposite to the conventions in timm and the community and we will revise it. As for the model structure in printing, you can disable repr to make the printing same as the Pytorch by default. We organize the printing in this format for easier use in NAS.

The license fix has been merged. Please check. In short, we will do a followup fix to these name conventions shortly. Thanks.

rwightman commented 2 years ago

@linnanwang thanks for the updates, will the models be retrained with a padding fix? as it stands, if the padding is fixed to follow a consistent scheme, the accuracy of current weights will take a noteworthy hit.

I would like to include these models i timm as part of the EfficientNet family, support is fairly straightforward with one exception needed to handle the fact that these models use a different activation for stem and head. However, I wouldn't want to support the no-padding in the edge (fusedmbconv) blocks.

linnanwang commented 2 years ago

@rwightman Thanks Ross. It will be great to see GPUNet into Timm's library. I'm talking to the team to see our bandwidth and make a properly plan ahead. But I'm very excited to confirm that you're interested in including GPUNet into Timm. I will post an update here shortly once I will reach a consensus with the team. Thank you.

linnanwang commented 2 years ago

Hello @rwightman, we have changed the naming and padding of GPUNet and the following is the new network structure from get_configs(batch=1, latency="0.65ms", gpuType="GV100") on our side. Could you please kindly take a look, and let us know if this one works for Timm? Thank you.

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): SiLU(inplace=True)
      )
    )
    (stage: 1 layer3): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 1 layer4): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer5): EdgeResidual(
      (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(160, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer6): EdgeResidual(
      (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(160, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer7): EdgeResidual(
      (conv_exp): Conv2d(32, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(160, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer8): EdgeResidual(
      (conv_exp): Conv2d(64, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(320, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer9): EdgeResidual(
      (conv_exp): Conv2d(64, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(320, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer10): InvertedResidual(
      (conv_pw): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=320, bias=False)
      (bn2): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(320, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer11): InvertedResidual(
      (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280, bias=False)
      (bn2): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer12): InvertedResidual(
      (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280, bias=False)
      (bn2): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer13): InvertedResidual(
      (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1280, bias=False)
      (bn2): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1280, 64, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(64, 1280, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1280, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer14): InvertedResidual(
      (conv_pw): Conv2d(704, 3520, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(3520, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(3520, 3520, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=3520, bias=False)
      (bn2): BatchNorm2d(3520, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(3520, 176, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(176, 3520, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(3520, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 15): Epilogue(
      (net): Sequential(
        (0): Conv2d(704, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1280, out_features=1000, bias=True)
      )
    )
  )
)

rwightman commented 2 years ago

@linnanwang that's looking much better, yes. Looks like the padding is fixed too?

Only other comment it's a bit odd that the stem/prologue and head/epilogue have different activations, was that by design or just ended up that way? If new models haven't been trained it might be better to make both silu, otherwise not a huge deal...

linnanwang commented 2 years ago

Thank you @rwightman. The activation in stem/head are all RELU now.

Another thing I want to confirm with you is our customized InvertedResidual. We search for its internal structure and sometimes SE layers can be identify. For example, please take a look the one below:

(stage: 4 layer12): InvertedResidual(
  (conv_pw): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (bn1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  (act1): SiLU(inplace=True)
  (conv_dw): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280, bias=False)
  (bn2): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  (act2): SiLU(inplace=True)
  (se): Identity()
  (conv_pwl): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
)

The implementation of this module is here: https://github.com/NVIDIA/DeepLearningExamples/blob/a5feffa7eebe153cfbd9bb84ec270004fa3290c5/PyTorch/Classification/GPUNet/models/gpunet_modules.py#L350 Is it okay on your side?

rwightman commented 2 years ago

@linnanwang yes, I can turn on/off the se for InvertedResidual per block in timm, whether it's identity or None, doesn't matter for the weight loading

linnanwang commented 2 years ago

Thank you @rwightman. I'm pasting all the model definitions here for you to double check with Timm's compatibility. Please note re-train these models are very expensive, so let's try to make sure everything works fine after the re-train. If you feel the below JSON is not sufficient to ensure the compatibility, please let me know. Thank you for your understanding.

Here is: get_configs(batch=1, latency="0.85ms", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage: 1 layer3): ConvBnAct(
      (conv): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer4): EdgeResidual(
      (conv_exp): Conv2d(24, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(96, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer5): EdgeResidual(
      (conv_exp): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer6): EdgeResidual(
      (conv_exp): Conv2d(64, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(256, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer7): EdgeResidual(
      (conv_exp): Conv2d(96, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer8): InvertedResidual(
      (conv_pw): Conv2d(96, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
      (bn2): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(192, 24, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(24, 192, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(192, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer9): InvertedResidual(
      (conv_pw): Conv2d(160, 800, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(800, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(800, 800, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=800, bias=False)
      (bn2): BatchNorm2d(800, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(800, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer10): InvertedResidual(
      (conv_pw): Conv2d(288, 1440, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1440, 1440, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1440, bias=False)
      (bn2): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1440, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer11): InvertedResidual(
      (conv_pw): Conv2d(288, 1440, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1440, 1440, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1440, bias=False)
      (bn2): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1440, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer12): InvertedResidual(
      (conv_pw): Conv2d(288, 1440, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1440, 1440, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1440, bias=False)
      (bn2): BatchNorm2d(1440, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1440, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer13): InvertedResidual(
      (conv_pw): Conv2d(288, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 72, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(72, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer14): InvertedResidual(
      (conv_pw): Conv2d(448, 1792, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1792, 1792, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1792, bias=False)
      (bn2): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1792, 112, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(112, 1792, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1792, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer15): InvertedResidual(
      (conv_pw): Conv2d(448, 1792, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1792, 1792, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1792, bias=False)
      (bn2): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1792, 112, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(112, 1792, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1792, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer16): InvertedResidual(
      (conv_pw): Conv2d(448, 1792, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1792, 1792, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1792, bias=False)
      (bn2): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1792, 112, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(112, 1792, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1792, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 17): Epilogue(
      (net): Sequential(
        (0): Conv2d(448, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1280, out_features=1000, bias=True)
      )
    )
  )
)

linnanwang commented 2 years ago

get_configs(batch=1, latency="1.75ms", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage: 1 layer3): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 1 layer4): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 1 layer5): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer6): EdgeResidual(
      (conv_exp): Conv2d(32, 192, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), bias=False)
      (bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer7): EdgeResidual(
      (conv_exp): Conv2d(32, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(96, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer8): EdgeResidual(
      (conv_exp): Conv2d(112, 336, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(336, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(336, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer9): EdgeResidual(
      (conv_exp): Conv2d(112, 336, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(336, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(336, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer10): InvertedResidual(
      (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(672, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer11): InvertedResidual(
      (conv_pw): Conv2d(144, 864, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(864, 864, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=864, bias=False)
      (bn2): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(864, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer12): InvertedResidual(
      (conv_pw): Conv2d(144, 864, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(864, 864, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=864, bias=False)
      (bn2): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(864, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer13): InvertedResidual(
      (conv_pw): Conv2d(144, 864, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(864, 864, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=864, bias=False)
      (bn2): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(864, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer14): InvertedResidual(
      (conv_pw): Conv2d(144, 864, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(864, 864, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=864, bias=False)
      (bn2): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(864, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer15): InvertedResidual(
      (conv_pw): Conv2d(144, 864, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(864, 864, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=864, bias=False)
      (bn2): BatchNorm2d(864, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(864, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer16): InvertedResidual(
      (conv_pw): Conv2d(144, 432, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(432, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(432, 432, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=432, bias=False)
      (bn2): BatchNorm2d(432, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(432, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer17): InvertedResidual(
      (conv_pw): Conv2d(160, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(480, 480, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=480, bias=False)
      (bn2): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(480, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer18): InvertedResidual(
      (conv_pw): Conv2d(160, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(480, 480, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=480, bias=False)
      (bn2): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(480, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer19): InvertedResidual(
      (conv_pw): Conv2d(160, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(480, 480, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=480, bias=False)
      (bn2): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(480, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer20): InvertedResidual(
      (conv_pw): Conv2d(160, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(480, 480, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=480, bias=False)
      (bn2): BatchNorm2d(480, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(480, 40, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(40, 480, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(480, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer21): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer22): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer23): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer24): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer25): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer26): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer27): InvertedResidual(
      (conv_pw): Conv2d(224, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): SiLU(inplace=True)
      (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
      (bn2): BatchNorm2d(672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): SiLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(672, 56, kernel_size=(1, 1), stride=(1, 1))
        (act1): SiLU(inplace=True)
        (conv_expand): Conv2d(56, 672, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(672, 224, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(224, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer28): InvertedResidual(
      (conv_pw): Conv2d(224, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(448, 448, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=448, bias=False)
      (bn2): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(448, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer29): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer30): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer31): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer32): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer33): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 7 layer34): InvertedResidual(
      (conv_pw): Conv2d(832, 1664, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1664, 1664, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1664, bias=False)
      (bn2): BatchNorm2d(1664, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1664, 832, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(832, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 35): Epilogue(
      (net): Sequential(
        (0): Conv2d(832, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1280, out_features=1000, bias=True)
      )
    )
  )
)

linnanwang commented 2 years ago

get_configs(batch=1, latency="0.5ms-D", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage: 1 layer3): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 1 layer4): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer5): ConvBnAct(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 3 layer6): InvertedResidual(
      (conv_pw): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512, bias=False)
      (bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(512, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer7): InvertedResidual(
      (conv_pw): Conv2d(96, 736, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(736, 736, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=736, bias=False)
      (bn2): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(736, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer8): InvertedResidual(
      (conv_pw): Conv2d(96, 736, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(736, 736, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=736, bias=False)
      (bn2): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(736, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer9): InvertedResidual(
      (conv_pw): Conv2d(96, 736, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(736, 736, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=736, bias=False)
      (bn2): BatchNorm2d(736, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(736, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer10): InvertedResidual(
      (conv_pw): Conv2d(256, 1088, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1088, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1088, 1088, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1088, bias=False)
      (bn2): BatchNorm2d(1088, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1088, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer11): InvertedResidual(
      (conv_pw): Conv2d(256, 1216, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1216, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1216, 1216, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1216, bias=False)
      (bn2): BatchNorm2d(1216, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1216, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer12): InvertedResidual(
      (conv_pw): Conv2d(256, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2048, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=2048, bias=False)
      (bn2): BatchNorm2d(2048, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(2048, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer13): InvertedResidual(
      (conv_pw): Conv2d(704, 2688, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2688, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2688, 2688, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=2688, bias=False)
      (bn2): BatchNorm2d(2688, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(2688, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer14): InvertedResidual(
      (conv_pw): Conv2d(704, 2368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2368, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2368, 2368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2368, bias=False)
      (bn2): BatchNorm2d(2368, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(2368, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer15): InvertedResidual(
      (conv_pw): Conv2d(704, 1792, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1792, 1792, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1792, bias=False)
      (bn2): BatchNorm2d(1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1792, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer16): InvertedResidual(
      (conv_pw): Conv2d(704, 4032, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(4032, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(4032, 4032, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4032, bias=False)
      (bn2): BatchNorm2d(4032, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(4032, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 17): Epilogue(
      (net): Sequential(
        (0): Conv2d(704, 1984, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1984, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1984, out_features=1000, bias=True)
      )
    )
  )
)

linnanwang commented 2 years ago

get_configs(batch=1, latency="0.8ms-D", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage: 1 layer3): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 1 layer4): ConvBnAct(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer5): ConvBnAct(
      (conv): Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
    )
    (stage: 2 layer6): InvertedResidual(
      (conv_pw): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer7): InvertedResidual(
      (conv_pw): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer8): InvertedResidual(
      (conv_pw): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(512, 512, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=512, bias=False)
      (bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(512, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer9): InvertedResidual(
      (conv_pw): Conv2d(96, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(768, 768, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=768, bias=False)
      (bn2): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(768, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer10): InvertedResidual(
      (conv_pw): Conv2d(96, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(768, 768, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=768, bias=False)
      (bn2): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer11): InvertedResidual(
      (conv_pw): Conv2d(256, 1920, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1920, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1920, 1920, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1920, bias=False)
      (bn2): BatchNorm2d(1920, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(1920, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer12): InvertedResidual(
      (conv_pw): Conv2d(256, 2016, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2016, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2016, 2016, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2016, bias=False)
      (bn2): BatchNorm2d(2016, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(2016, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer13): InvertedResidual(
      (conv_pw): Conv2d(256, 2016, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2016, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2016, 2016, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=2016, bias=False)
      (bn2): BatchNorm2d(2016, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(2016, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer14): InvertedResidual(
      (conv_pw): Conv2d(704, 5632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(5632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(5632, 5632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=5632, bias=False)
      (bn2): BatchNorm2d(5632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(5632, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer15): InvertedResidual(
      (conv_pw): Conv2d(704, 5216, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(5216, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(5216, 5216, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=5216, bias=False)
      (bn2): BatchNorm2d(5216, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(5216, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 5 layer16): InvertedResidual(
      (conv_pw): Conv2d(704, 5632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(5632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(5632, 5632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=5632, bias=False)
      (bn2): BatchNorm2d(5632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(5632, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 6 layer17): InvertedResidual(
      (conv_pw): Conv2d(704, 5600, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(5600, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(5600, 5600, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=5600, bias=False)
      (bn2): BatchNorm2d(5600, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): Identity()
      (conv_pwl): Conv2d(5600, 704, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(704, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 18): Epilogue(
      (net): Sequential(
        (0): Conv2d(704, 1984, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1984, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1984, out_features=1000, bias=True)
      )
    )
  )
)

linnanwang commented 2 years ago

get_configs(batch=1, latency="1.25ms-D", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): Prologue(
      (net): Sequential(
        (0): Conv2d(3, 33, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(33, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage: 0 layer3): InvertedResidual(
      (conv_pw): Conv2d(33, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn2): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(32, 33, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(33, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer4): InvertedResidual(
      (conv_pw): Conv2d(33, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
      (bn2): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(96, 8, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(8, 96, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(96, 44, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(44, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer5): InvertedResidual(
      (conv_pw): Conv2d(44, 176, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(176, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(176, 176, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=176, bias=False)
      (bn2): BatchNorm2d(176, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(176, 11, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(11, 176, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(176, 44, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(44, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer6): InvertedResidual(
      (conv_pw): Conv2d(44, 136, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(136, 136, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=136, bias=False)
      (bn2): BatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(136, 11, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(11, 136, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(136, 44, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(44, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer7): InvertedResidual(
      (conv_pw): Conv2d(44, 176, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(176, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(176, 176, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=176, bias=False)
      (bn2): BatchNorm2d(176, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(176, 11, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(11, 176, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(176, 67, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(67, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer8): InvertedResidual(
      (conv_pw): Conv2d(67, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(272, 272, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=272, bias=False)
      (bn2): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(272, 17, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(17, 272, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(272, 67, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(67, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer9): InvertedResidual(
      (conv_pw): Conv2d(67, 400, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(400, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=400, bias=False)
      (bn2): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(400, 17, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(17, 400, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(400, 67, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(67, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer10): InvertedResidual(
      (conv_pw): Conv2d(67, 200, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(200, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(200, 200, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=200, bias=False)
      (bn2): BatchNorm2d(200, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(200, 17, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(17, 200, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(200, 67, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(67, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer11): InvertedResidual(
      (conv_pw): Conv2d(67, 400, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(400, 400, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=400, bias=False)
      (bn2): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(400, 17, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(17, 400, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(400, 134, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(134, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer12): InvertedResidual(
      (conv_pw): Conv2d(134, 808, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(808, 808, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=808, bias=False)
      (bn2): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(808, 34, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(34, 808, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(808, 134, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(134, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer13): InvertedResidual(
      (conv_pw): Conv2d(134, 400, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(400, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=400, bias=False)
      (bn2): BatchNorm2d(400, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(400, 33, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(33, 400, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(400, 134, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(134, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer14): InvertedResidual(
      (conv_pw): Conv2d(134, 536, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(536, 536, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=536, bias=False)
      (bn2): BatchNorm2d(536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(536, 34, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(34, 536, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(536, 134, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(134, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer15): InvertedResidual(
      (conv_pw): Conv2d(134, 808, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(808, 808, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=808, bias=False)
      (bn2): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(808, 34, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(34, 808, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(808, 190, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(190, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer16): InvertedResidual(
      (conv_pw): Conv2d(190, 1144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1144, 1144, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1144, bias=False)
      (bn2): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1144, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 1144, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1144, 190, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(190, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer17): InvertedResidual(
      (conv_pw): Conv2d(190, 1144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1144, 1144, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=1144, bias=False)
      (bn2): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1144, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 1144, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1144, 190, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(190, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer18): InvertedResidual(
      (conv_pw): Conv2d(190, 760, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(760, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(760, 760, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=760, bias=False)
      (bn2): BatchNorm2d(760, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(760, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 760, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(760, 190, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(190, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer19): InvertedResidual(
      (conv_pw): Conv2d(190, 1144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1144, 1144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1144, bias=False)
      (bn2): BatchNorm2d(1144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1144, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 1144, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1144, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer20): InvertedResidual(
      (conv_pw): Conv2d(268, 1608, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1608, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1608, 1608, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1608, bias=False)
      (bn2): BatchNorm2d(1608, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1608, 67, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(67, 1608, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1608, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer21): InvertedResidual(
      (conv_pw): Conv2d(268, 1608, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1608, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1608, 1608, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1608, bias=False)
      (bn2): BatchNorm2d(1608, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1608, 67, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(67, 1608, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1608, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer22): InvertedResidual(
      (conv_pw): Conv2d(268, 808, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(808, 808, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=808, bias=False)
      (bn2): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(808, 67, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(67, 808, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(808, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer23): InvertedResidual(
      (conv_pw): Conv2d(268, 808, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(808, 808, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=808, bias=False)
      (bn2): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(808, 67, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(67, 808, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(808, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer24): InvertedResidual(
      (conv_pw): Conv2d(268, 808, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(808, 808, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=808, bias=False)
      (bn2): BatchNorm2d(808, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(808, 67, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(67, 808, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(808, 268, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(268, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 25): Epilogue(
      (net): Sequential(
        (0): Conv2d(268, 1984, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1984, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1984, out_features=1000, bias=True)
      )
    )
  )
)

linnanwang commented 2 years ago

get_configs(batch=1, latency="2.25ms-D", gpuType="GV100")

GPUNet(
  (network): Sequential(
    (stem: 2): PrologueLargeD(
      (net): Sequential(
        (0): Conv2d(3, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
        (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU()
        (6): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (8): ReLU()
      )
    )
    (stage: 0 layer3): InvertedResidual(
      (conv_pw): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
      (bn2): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(48, 12, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(12, 48, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer4): InvertedResidual(
      (conv_pw): Conv2d(48, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
      (bn2): BatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(144, 12, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(12, 144, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(144, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer5): InvertedResidual(
      (conv_pw): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(256, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=256, bias=False)
      (bn2): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(16, 256, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 1 layer6): InvertedResidual(
      (conv_pw): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
      (bn2): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(16, 192, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer7): InvertedResidual(
      (conv_pw): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(256, 256, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=256, bias=False)
      (bn2): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(16, 256, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(256, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer8): InvertedResidual(
      (conv_pw): Conv2d(96, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(384, 384, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=384, bias=False)
      (bn2): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(384, 24, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(24, 384, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer9): InvertedResidual(
      (conv_pw): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
      (bn2): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(576, 24, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(24, 576, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 2 layer10): InvertedResidual(
      (conv_pw): Conv2d(96, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(288, 288, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=288, bias=False)
      (bn2): BatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(288, 24, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(24, 288, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(288, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer11): InvertedResidual(
      (conv_pw): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(576, 576, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=576, bias=False)
      (bn2): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(576, 24, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(24, 576, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(576, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer12): InvertedResidual(
      (conv_pw): Conv2d(192, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer13): InvertedResidual(
      (conv_pw): Conv2d(192, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
      (bn2): BatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(576, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 576, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(576, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer14): InvertedResidual(
      (conv_pw): Conv2d(192, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(768, 768, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=768, bias=False)
      (bn2): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(768, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 768, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer15): InvertedResidual(
      (conv_pw): Conv2d(192, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(768, 768, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=768, bias=False)
      (bn2): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(768, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 768, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer16): InvertedResidual(
      (conv_pw): Conv2d(192, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 48, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(48, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer17): InvertedResidual(
      (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
      (bn2): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer18): InvertedResidual(
      (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1632, 1632, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=1632, bias=False)
      (bn2): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer19): InvertedResidual(
      (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
      (bn2): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 3 layer20): InvertedResidual(
      (conv_pw): Conv2d(272, 1088, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1088, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1088, 1088, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1088, bias=False)
      (bn2): BatchNorm2d(1088, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1088, 68, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(68, 1088, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1088, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer21): InvertedResidual(
      (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1632, 1632, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=1632, bias=False)
      (bn2): BatchNorm2d(1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1632, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer22): InvertedResidual(
      (conv_pw): Conv2d(384, 2304, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2304, 2304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2304, bias=False)
      (bn2): BatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(2304, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 2304, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(2304, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer23): InvertedResidual(
      (conv_pw): Conv2d(384, 2304, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(2304, 2304, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=2304, bias=False)
      (bn2): BatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(2304, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 2304, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(2304, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer24): InvertedResidual(
      (conv_pw): Conv2d(384, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer25): InvertedResidual(
      (conv_pw): Conv2d(384, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer26): InvertedResidual(
      (conv_pw): Conv2d(384, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (stage: 4 layer27): InvertedResidual(
      (conv_pw): Conv2d(384, 1152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv_dw): Conv2d(1152, 1152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1152, bias=False)
      (bn2): BatchNorm2d(1152, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (se): SqueezeExcite(
        (conv_reduce): Conv2d(1152, 96, kernel_size=(1, 1), stride=(1, 1))
        (act1): ReLU(inplace=True)
        (conv_expand): Conv2d(96, 1152, kernel_size=(1, 1), stride=(1, 1))
        (gate): Sigmoid()
      )
      (conv_pwl): Conv2d(1152, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    )
    (head: 28): Epilogue(
      (net): Sequential(
        (0): Conv2d(384, 1984, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1984, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): AdaptiveAvgPool2d(output_size=1)
        (4): Flatten(start_dim=1, end_dim=-1)
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=1984, out_features=1000, bias=True)
      )
    )
  )
)