aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
442 stars 145 forks source link

yolor support #402

Closed alejoGT1202 closed 2 years ago

alejoGT1202 commented 2 years ago

Hello I'm trying to compile yolor_w6 so it can be used with the inf1 instances.

I inspected the model which gave me the following output:

Darknet(%[2857] : torch.float32(1, 3, 1280, 1280)):
  Sequential#494/Reorg#1:
      %[./6] : torch.float32(1, 3, 640, 1280) = ./aten::slice#4(%[2857])
      %[./11] : torch.float32(1, 3, 640, 640) = ./aten::slice#9(%[./6])
      %[./16] : torch.float32(1, 3, 640, 1280) = ./aten::slice#14(%[2857])
      %[./21] : torch.float32(1, 3, 640, 640) = ./aten::slice#19(%[./16])
      %[./26] : torch.float32(1, 3, 640, 1280) = ./aten::slice#24(%[2857])
      %[./31] : torch.float32(1, 3, 640, 640) = ./aten::slice#29(%[./26])
      %[./36] : torch.float32(1, 3, 640, 1280) = ./aten::slice#34(%[2857])
      %[./41] : torch.float32(1, 3, 640, 640) = ./aten::slice#39(%[./36])
      %[13357] : torch.float32(1, 12, 640, 640) = ./aten::cat#42()
  Sequential#495/Conv2d#3:
      %[Sequential#495/8] : torch.float32(1, 64, 640, 640) = ./aten::_convolution#20(%[13357])
  Sequential#495/BatchNorm2d#4:
      %[Sequential#495/9] : torch.float32(1, 64, 640, 640) = ./aten::batch_norm#8(%[Sequential#495/8])
  Sequential#495/SiLU#5:
      %[13358] : torch.float32(1, 64, 640, 640) = ./aten::silu#0(%[Sequential#495/9])
  Sequential#496/Conv2d#3:
      %[Sequential#496/8] : torch.float32(1, 128, 320, 320) = ./aten::_convolution#20(%[13358])
  Sequential#496/BatchNorm2d#4:
      %[Sequential#496/9] : torch.float32(1, 128, 320, 320) = ./aten::batch_norm#8(%[Sequential#496/8])
  Sequential#496/SiLU#5:
      %[13359] : torch.float32(1, 128, 320, 320) = ./aten::silu#0(%[Sequential#496/9])
  Sequential#497/Conv2d#3:
      %[Sequential#497/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13359])
  Sequential#497/BatchNorm2d#4:
      %[Sequential#497/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#497/8])
  Sequential#497/SiLU#5:
      %[13360] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#497/9])
  Sequential#499/Conv2d#3:
      %[Sequential#499/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13359])
  Sequential#499/BatchNorm2d#4:
      %[Sequential#499/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#499/8])
  Sequential#499/SiLU#5:
      %[13362] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#499/9])
  Sequential#500/Conv2d#3:
      %[Sequential#500/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13362])
  Sequential#500/BatchNorm2d#4:
      %[Sequential#500/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#500/8])
  Sequential#500/SiLU#5:
      %[13363] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#500/9])
  Sequential#501/Conv2d#3:
      %[Sequential#501/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13363])
  Sequential#501/BatchNorm2d#4:
      %[Sequential#501/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#501/8])
  Sequential#501/SiLU#5:
      %[13364] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#501/9])
  WeightedFeatureFusion#502:
    %[13365] : torch.float32(1, 64, 320, 320) = ./aten::add#1(%[13364], %[13362])
  Sequential#503/Conv2d#3:
      %[Sequential#503/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13365])
  Sequential#503/BatchNorm2d#4:
      %[Sequential#503/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#503/8])
  Sequential#503/SiLU#5:
      %[13366] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#503/9])
  Sequential#504/Conv2d#3:
      %[Sequential#504/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13366])
  Sequential#504/BatchNorm2d#4:
      %[Sequential#504/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#504/8])
  Sequential#504/SiLU#5:
      %[13367] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#504/9])
  WeightedFeatureFusion#505:
    %[13368] : torch.float32(1, 64, 320, 320) = ./aten::add#1(%[13367], %[13365])
  Sequential#506/Conv2d#3:
      %[Sequential#506/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13368])
  Sequential#506/BatchNorm2d#4:
      %[Sequential#506/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#506/8])
  Sequential#506/SiLU#5:
      %[13369] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#506/9])
  Sequential#507/Conv2d#3:
      %[Sequential#507/8] : torch.float32(1, 64, 320, 320) = ./aten::_convolution#20(%[13369])
  Sequential#507/BatchNorm2d#4:
      %[Sequential#507/9] : torch.float32(1, 64, 320, 320) = ./aten::batch_norm#8(%[Sequential#507/8])
  Sequential#507/SiLU#5:
      %[13370] : torch.float32(1, 64, 320, 320) = ./aten::silu#0(%[Sequential#507/9])
  WeightedFeatureFusion#508:
    %[13371] : torch.float32(1, 64, 320, 320) = ./aten::add#1(%[13370], %[13368])
  FeatureConcat#509:
    %[13372] : torch.float32(1, 128, 320, 320) = ./aten::cat#2()
  Sequential#510/Conv2d#3:
      %[Sequential#510/8] : torch.float32(1, 128, 320, 320) = ./aten::_convolution#20(%[13372])
  Sequential#510/BatchNorm2d#4:
      %[Sequential#510/9] : torch.float32(1, 128, 320, 320) = ./aten::batch_norm#8(%[Sequential#510/8])
  Sequential#510/SiLU#5:
      %[13373] : torch.float32(1, 128, 320, 320) = ./aten::silu#0(%[Sequential#510/9])
  Sequential#511/Conv2d#3:
      %[Sequential#511/8] : torch.float32(1, 256, 160, 160) = ./aten::_convolution#20(%[13373])
  Sequential#511/BatchNorm2d#4:
      %[Sequential#511/9] : torch.float32(1, 256, 160, 160) = ./aten::batch_norm#8(%[Sequential#511/8])
  Sequential#511/SiLU#5:
      %[13374] : torch.float32(1, 256, 160, 160) = ./aten::silu#0(%[Sequential#511/9])
  Sequential#512/Conv2d#3:
      %[Sequential#512/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13374])
  Sequential#512/BatchNorm2d#4:
      %[Sequential#512/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#512/8])
  Sequential#512/SiLU#5:
      %[13375] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#512/9])
  Sequential#514/Conv2d#3:
      %[Sequential#514/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13374])
  Sequential#514/BatchNorm2d#4:
      %[Sequential#514/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#514/8])
  Sequential#514/SiLU#5:
      %[13377] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#514/9])
  Sequential#515/Conv2d#3:
      %[Sequential#515/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13377])
  Sequential#515/BatchNorm2d#4:
      %[Sequential#515/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#515/8])
  Sequential#515/SiLU#5:
      %[13378] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#515/9])
  Sequential#516/Conv2d#3:
      %[Sequential#516/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13378])
  Sequential#516/BatchNorm2d#4:
      %[Sequential#516/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#516/8])
  Sequential#516/SiLU#5:
      %[13379] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#516/9])
  WeightedFeatureFusion#517:
    %[13380] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13379], %[13377])
  Sequential#518/Conv2d#3:
      %[Sequential#518/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13380])
  Sequential#518/BatchNorm2d#4:
      %[Sequential#518/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#518/8])
  Sequential#518/SiLU#5:
      %[13381] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#518/9])
  Sequential#519/Conv2d#3:
      %[Sequential#519/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13381])
  Sequential#519/BatchNorm2d#4:
      %[Sequential#519/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#519/8])
  Sequential#519/SiLU#5:
      %[13382] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#519/9])
  WeightedFeatureFusion#520:
    %[13383] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13382], %[13380])
  Sequential#521/Conv2d#3:
      %[Sequential#521/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13383])
  Sequential#521/BatchNorm2d#4:
      %[Sequential#521/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#521/8])
  Sequential#521/SiLU#5:
      %[13384] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#521/9])
  Sequential#522/Conv2d#3:
      %[Sequential#522/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13384])
  Sequential#522/BatchNorm2d#4:
      %[Sequential#522/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#522/8])
  Sequential#522/SiLU#5:
      %[13385] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#522/9])
  WeightedFeatureFusion#523:
    %[13386] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13385], %[13383])
  Sequential#524/Conv2d#3:
      %[Sequential#524/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13386])
  Sequential#524/BatchNorm2d#4:
      %[Sequential#524/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#524/8])
  Sequential#524/SiLU#5:
      %[13387] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#524/9])
  Sequential#525/Conv2d#3:
      %[Sequential#525/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13387])
  Sequential#525/BatchNorm2d#4:
      %[Sequential#525/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#525/8])
  Sequential#525/SiLU#5:
      %[13388] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#525/9])
  WeightedFeatureFusion#526:
    %[13389] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13388], %[13386])
  Sequential#527/Conv2d#3:
      %[Sequential#527/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13389])
  Sequential#527/BatchNorm2d#4:
      %[Sequential#527/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#527/8])
  Sequential#527/SiLU#5:
      %[13390] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#527/9])
  Sequential#528/Conv2d#3:
      %[Sequential#528/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13390])
  Sequential#528/BatchNorm2d#4:
      %[Sequential#528/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#528/8])
  Sequential#528/SiLU#5:
      %[13391] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#528/9])
  WeightedFeatureFusion#529:
    %[13392] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13391], %[13389])
  Sequential#530/Conv2d#3:
      %[Sequential#530/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13392])
  Sequential#530/BatchNorm2d#4:
      %[Sequential#530/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#530/8])
  Sequential#530/SiLU#5:
      %[13393] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#530/9])
  Sequential#531/Conv2d#3:
      %[Sequential#531/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13393])
  Sequential#531/BatchNorm2d#4:
      %[Sequential#531/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#531/8])
  Sequential#531/SiLU#5:
      %[13394] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#531/9])
  WeightedFeatureFusion#532:
    %[13395] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13394], %[13392])
  Sequential#533/Conv2d#3:
      %[Sequential#533/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13395])
  Sequential#533/BatchNorm2d#4:
      %[Sequential#533/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#533/8])
  Sequential#533/SiLU#5:
      %[13396] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#533/9])
  Sequential#534/Conv2d#3:
      %[Sequential#534/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13396])
  Sequential#534/BatchNorm2d#4:
      %[Sequential#534/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#534/8])
  Sequential#534/SiLU#5:
      %[13397] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#534/9])
  WeightedFeatureFusion#535:
    %[13398] : torch.float32(1, 128, 160, 160) = ./aten::add#1(%[13397], %[13395])
  FeatureConcat#536:
    %[13399] : torch.float32(1, 256, 160, 160) = ./aten::cat#2()
  Sequential#537/Conv2d#3:
      %[Sequential#537/8] : torch.float32(1, 256, 160, 160) = ./aten::_convolution#20(%[13399])
  Sequential#537/BatchNorm2d#4:
      %[Sequential#537/9] : torch.float32(1, 256, 160, 160) = ./aten::batch_norm#8(%[Sequential#537/8])
  Sequential#537/SiLU#5:
      %[13400] : torch.float32(1, 256, 160, 160) = ./aten::silu#0(%[Sequential#537/9])
  Sequential#538/Conv2d#3:
      %[Sequential#538/8] : torch.float32(1, 512, 80, 80) = ./aten::_convolution#20(%[13400])
  Sequential#538/BatchNorm2d#4:
      %[Sequential#538/9] : torch.float32(1, 512, 80, 80) = ./aten::batch_norm#8(%[Sequential#538/8])
  Sequential#538/SiLU#5:
      %[13401] : torch.float32(1, 512, 80, 80) = ./aten::silu#0(%[Sequential#538/9])
  Sequential#539/Conv2d#3:
      %[Sequential#539/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13401])
  Sequential#539/BatchNorm2d#4:
      %[Sequential#539/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#539/8])
  Sequential#539/SiLU#5:
      %[13402] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#539/9])
  Sequential#541/Conv2d#3:
      %[Sequential#541/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13401])
  Sequential#541/BatchNorm2d#4:
      %[Sequential#541/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#541/8])
  Sequential#541/SiLU#5:
      %[13404] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#541/9])
  Sequential#542/Conv2d#3:
      %[Sequential#542/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13404])
  Sequential#542/BatchNorm2d#4:
      %[Sequential#542/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#542/8])
  Sequential#542/SiLU#5:
      %[13405] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#542/9])
  Sequential#543/Conv2d#3:
      %[Sequential#543/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13405])
  Sequential#543/BatchNorm2d#4:
      %[Sequential#543/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#543/8])
  Sequential#543/SiLU#5:
      %[13406] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#543/9])
  WeightedFeatureFusion#544:
    %[13407] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13406], %[13404])
  Sequential#545/Conv2d#3:
      %[Sequential#545/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13407])
  Sequential#545/BatchNorm2d#4:
      %[Sequential#545/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#545/8])
  Sequential#545/SiLU#5:
      %[13408] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#545/9])
  Sequential#546/Conv2d#3:
      %[Sequential#546/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13408])
  Sequential#546/BatchNorm2d#4:
      %[Sequential#546/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#546/8])
  Sequential#546/SiLU#5:
      %[13409] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#546/9])
  WeightedFeatureFusion#547:
    %[13410] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13409], %[13407])
  Sequential#548/Conv2d#3:
      %[Sequential#548/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13410])
  Sequential#548/BatchNorm2d#4:
      %[Sequential#548/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#548/8])
  Sequential#548/SiLU#5:
      %[13411] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#548/9])
  Sequential#549/Conv2d#3:
      %[Sequential#549/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13411])
  Sequential#549/BatchNorm2d#4:
      %[Sequential#549/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#549/8])
  Sequential#549/SiLU#5:
      %[13412] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#549/9])
  WeightedFeatureFusion#550:
    %[13413] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13412], %[13410])
  Sequential#551/Conv2d#3:
      %[Sequential#551/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13413])
  Sequential#551/BatchNorm2d#4:
      %[Sequential#551/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#551/8])
  Sequential#551/SiLU#5:
      %[13414] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#551/9])
  Sequential#552/Conv2d#3:
      %[Sequential#552/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13414])
  Sequential#552/BatchNorm2d#4:
      %[Sequential#552/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#552/8])
  Sequential#552/SiLU#5:
      %[13415] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#552/9])
  WeightedFeatureFusion#553:
    %[13416] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13415], %[13413])
  Sequential#554/Conv2d#3:
      %[Sequential#554/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13416])
  Sequential#554/BatchNorm2d#4:
      %[Sequential#554/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#554/8])
  Sequential#554/SiLU#5:
      %[13417] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#554/9])
  Sequential#555/Conv2d#3:
      %[Sequential#555/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13417])
  Sequential#555/BatchNorm2d#4:
      %[Sequential#555/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#555/8])
  Sequential#555/SiLU#5:
      %[13418] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#555/9])
  WeightedFeatureFusion#556:
    %[13419] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13418], %[13416])
  Sequential#557/Conv2d#3:
      %[Sequential#557/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13419])
  Sequential#557/BatchNorm2d#4:
      %[Sequential#557/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#557/8])
  Sequential#557/SiLU#5:
      %[13420] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#557/9])
  Sequential#558/Conv2d#3:
      %[Sequential#558/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13420])
  Sequential#558/BatchNorm2d#4:
      %[Sequential#558/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#558/8])
  Sequential#558/SiLU#5:
      %[13421] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#558/9])
  WeightedFeatureFusion#559:
    %[13422] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13421], %[13419])
  Sequential#560/Conv2d#3:
      %[Sequential#560/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13422])
  Sequential#560/BatchNorm2d#4:
      %[Sequential#560/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#560/8])
  Sequential#560/SiLU#5:
      %[13423] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#560/9])
  Sequential#561/Conv2d#3:
      %[Sequential#561/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13423])
  Sequential#561/BatchNorm2d#4:
      %[Sequential#561/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#561/8])
  Sequential#561/SiLU#5:
      %[13424] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#561/9])
  WeightedFeatureFusion#562:
    %[13425] : torch.float32(1, 256, 80, 80) = ./aten::add#1(%[13424], %[13422])
  FeatureConcat#563:
    %[13426] : torch.float32(1, 512, 80, 80) = ./aten::cat#2()
  Sequential#564/Conv2d#3:
      %[Sequential#564/8] : torch.float32(1, 512, 80, 80) = ./aten::_convolution#20(%[13426])
  Sequential#564/BatchNorm2d#4:
      %[Sequential#564/9] : torch.float32(1, 512, 80, 80) = ./aten::batch_norm#8(%[Sequential#564/8])
  Sequential#564/SiLU#5:
      %[13427] : torch.float32(1, 512, 80, 80) = ./aten::silu#0(%[Sequential#564/9])
  Sequential#565/Conv2d#3:
      %[Sequential#565/8] : torch.float32(1, 768, 40, 40) = ./aten::_convolution#20(%[13427])
  Sequential#565/BatchNorm2d#4:
      %[Sequential#565/9] : torch.float32(1, 768, 40, 40) = ./aten::batch_norm#8(%[Sequential#565/8])
  Sequential#565/SiLU#5:
      %[13428] : torch.float32(1, 768, 40, 40) = ./aten::silu#0(%[Sequential#565/9])
  Sequential#566/Conv2d#3:
      %[Sequential#566/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13428])
  Sequential#566/BatchNorm2d#4:
      %[Sequential#566/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#566/8])
  Sequential#566/SiLU#5:
      %[13429] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#566/9])
  Sequential#568/Conv2d#3:
      %[Sequential#568/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13428])
  Sequential#568/BatchNorm2d#4:
      %[Sequential#568/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#568/8])
  Sequential#568/SiLU#5:
      %[13431] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#568/9])
  Sequential#569/Conv2d#3:
      %[Sequential#569/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13431])
  Sequential#569/BatchNorm2d#4:
      %[Sequential#569/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#569/8])
  Sequential#569/SiLU#5:
      %[13432] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#569/9])
  Sequential#570/Conv2d#3:
      %[Sequential#570/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13432])
  Sequential#570/BatchNorm2d#4:
      %[Sequential#570/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#570/8])
  Sequential#570/SiLU#5:
      %[13433] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#570/9])
  WeightedFeatureFusion#571:
    %[13434] : torch.float32(1, 384, 40, 40) = ./aten::add#1(%[13433], %[13431])
  Sequential#572/Conv2d#3:
      %[Sequential#572/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13434])
  Sequential#572/BatchNorm2d#4:
      %[Sequential#572/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#572/8])
  Sequential#572/SiLU#5:
      %[13435] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#572/9])
  Sequential#573/Conv2d#3:
      %[Sequential#573/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13435])
  Sequential#573/BatchNorm2d#4:
      %[Sequential#573/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#573/8])
  Sequential#573/SiLU#5:
      %[13436] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#573/9])
  WeightedFeatureFusion#574:
    %[13437] : torch.float32(1, 384, 40, 40) = ./aten::add#1(%[13436], %[13434])
  Sequential#575/Conv2d#3:
      %[Sequential#575/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13437])
  Sequential#575/BatchNorm2d#4:
      %[Sequential#575/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#575/8])
  Sequential#575/SiLU#5:
      %[13438] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#575/9])
  Sequential#576/Conv2d#3:
      %[Sequential#576/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13438])
  Sequential#576/BatchNorm2d#4:
      %[Sequential#576/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#576/8])
  Sequential#576/SiLU#5:
      %[13439] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#576/9])
  WeightedFeatureFusion#577:
    %[13440] : torch.float32(1, 384, 40, 40) = ./aten::add#1(%[13439], %[13437])
  FeatureConcat#578:
    %[13441] : torch.float32(1, 768, 40, 40) = ./aten::cat#2()
  Sequential#579/Conv2d#3:
      %[Sequential#579/8] : torch.float32(1, 768, 40, 40) = ./aten::_convolution#20(%[13441])
  Sequential#579/BatchNorm2d#4:
      %[Sequential#579/9] : torch.float32(1, 768, 40, 40) = ./aten::batch_norm#8(%[Sequential#579/8])
  Sequential#579/SiLU#5:
      %[13442] : torch.float32(1, 768, 40, 40) = ./aten::silu#0(%[Sequential#579/9])
  Sequential#580/Conv2d#3:
      %[Sequential#580/8] : torch.float32(1, 1024, 20, 20) = ./aten::_convolution#20(%[13442])
  Sequential#580/BatchNorm2d#4:
      %[Sequential#580/9] : torch.float32(1, 1024, 20, 20) = ./aten::batch_norm#8(%[Sequential#580/8])
  Sequential#580/SiLU#5:
      %[13443] : torch.float32(1, 1024, 20, 20) = ./aten::silu#0(%[Sequential#580/9])
  Sequential#581/Conv2d#3:
      %[Sequential#581/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13443])
  Sequential#581/BatchNorm2d#4:
      %[Sequential#581/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#581/8])
  Sequential#581/SiLU#5:
      %[13444] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#581/9])
  Sequential#583/Conv2d#3:
      %[Sequential#583/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13443])
  Sequential#583/BatchNorm2d#4:
      %[Sequential#583/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#583/8])
  Sequential#583/SiLU#5:
      %[13446] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#583/9])
  Sequential#584/Conv2d#3:
      %[Sequential#584/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13446])
  Sequential#584/BatchNorm2d#4:
      %[Sequential#584/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#584/8])
  Sequential#584/SiLU#5:
      %[13447] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#584/9])
  Sequential#585/Conv2d#3:
      %[Sequential#585/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13447])
  Sequential#585/BatchNorm2d#4:
      %[Sequential#585/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#585/8])
  Sequential#585/SiLU#5:
      %[13448] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#585/9])
  WeightedFeatureFusion#586:
    %[13449] : torch.float32(1, 512, 20, 20) = ./aten::add#1(%[13448], %[13446])
  Sequential#587/Conv2d#3:
      %[Sequential#587/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13449])
  Sequential#587/BatchNorm2d#4:
      %[Sequential#587/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#587/8])
  Sequential#587/SiLU#5:
      %[13450] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#587/9])
  Sequential#588/Conv2d#3:
      %[Sequential#588/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13450])
  Sequential#588/BatchNorm2d#4:
      %[Sequential#588/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#588/8])
  Sequential#588/SiLU#5:
      %[13451] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#588/9])
  WeightedFeatureFusion#589:
    %[13452] : torch.float32(1, 512, 20, 20) = ./aten::add#1(%[13451], %[13449])
  Sequential#590/Conv2d#3:
      %[Sequential#590/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13452])
  Sequential#590/BatchNorm2d#4:
      %[Sequential#590/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#590/8])
  Sequential#590/SiLU#5:
      %[13453] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#590/9])
  Sequential#591/Conv2d#3:
      %[Sequential#591/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13453])
  Sequential#591/BatchNorm2d#4:
      %[Sequential#591/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#591/8])
  Sequential#591/SiLU#5:
      %[13454] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#591/9])
  WeightedFeatureFusion#592:
    %[13455] : torch.float32(1, 512, 20, 20) = ./aten::add#1(%[13454], %[13452])
  FeatureConcat#593:
    %[13456] : torch.float32(1, 1024, 20, 20) = ./aten::cat#2()
  Sequential#594/Conv2d#3:
      %[Sequential#594/8] : torch.float32(1, 1024, 20, 20) = ./aten::_convolution#20(%[13456])
  Sequential#594/BatchNorm2d#4:
      %[Sequential#594/9] : torch.float32(1, 1024, 20, 20) = ./aten::batch_norm#8(%[Sequential#594/8])
  Sequential#594/SiLU#5:
      %[13457] : torch.float32(1, 1024, 20, 20) = ./aten::silu#0(%[Sequential#594/9])
  Sequential#595/Conv2d#3:
      %[Sequential#595/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13457])
  Sequential#595/BatchNorm2d#4:
      %[Sequential#595/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#595/8])
  Sequential#595/SiLU#5:
      %[13458] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#595/9])
  Sequential#597/Conv2d#3:
      %[Sequential#597/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13457])
  Sequential#597/BatchNorm2d#4:
      %[Sequential#597/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#597/8])
  Sequential#597/SiLU#5:
      %[13460] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#597/9])
  Sequential#598/Conv2d#3:
      %[Sequential#598/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13460])
  Sequential#598/BatchNorm2d#4:
      %[Sequential#598/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#598/8])
  Sequential#598/SiLU#5:
      %[13461] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#598/9])
  Sequential#599/Conv2d#3:
      %[Sequential#599/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13461])
  Sequential#599/BatchNorm2d#4:
      %[Sequential#599/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#599/8])
  Sequential#599/SiLU#5:
      %[13462] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#599/9])
  MaxPool2d#600:
    %[13463] : torch.float32(1, 512, 20, 20) = ./aten::max_pool2d#13(%[13462])
  MaxPool2d#602:
    %[13465] : torch.float32(1, 512, 20, 20) = ./aten::max_pool2d#13(%[13462])
  MaxPool2d#604:
    %[13467] : torch.float32(1, 512, 20, 20) = ./aten::max_pool2d#13(%[13462])
  FeatureConcat#605:
    %[13468] : torch.float32(1, 2048, 20, 20) = ./aten::cat#2()
  Sequential#606/Conv2d#3:
      %[Sequential#606/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13468])
  Sequential#606/BatchNorm2d#4:
      %[Sequential#606/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#606/8])
  Sequential#606/SiLU#5:
      %[13469] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#606/9])
  Sequential#607/Conv2d#3:
      %[Sequential#607/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13469])
  Sequential#607/BatchNorm2d#4:
      %[Sequential#607/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#607/8])
  Sequential#607/SiLU#5:
      %[13470] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#607/9])
  FeatureConcat#608:
    %[13471] : torch.float32(1, 1024, 20, 20) = ./aten::cat#2()
  Sequential#609/Conv2d#3:
      %[Sequential#609/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13471])
  Sequential#609/BatchNorm2d#4:
      %[Sequential#609/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#609/8])
  Sequential#609/SiLU#5:
      %[13472] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#609/9])
  Sequential#610/Conv2d#3:
      %[Sequential#610/8] : torch.float32(1, 384, 20, 20) = ./aten::_convolution#20(%[13472])
  Sequential#610/BatchNorm2d#4:
      %[Sequential#610/9] : torch.float32(1, 384, 20, 20) = ./aten::batch_norm#8(%[Sequential#610/8])
  Sequential#610/SiLU#5:
      %[13473] : torch.float32(1, 384, 20, 20) = ./aten::silu#0(%[Sequential#610/9])
  Upsample#611:
    %[13474] : torch.float32(1, 384, 40, 40) = ./aten::upsample_nearest2d#4(%[13473])
  Sequential#613/Conv2d#3:
      %[Sequential#613/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13442])
  Sequential#613/BatchNorm2d#4:
      %[Sequential#613/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#613/8])
  Sequential#613/SiLU#5:
      %[13476] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#613/9])
  FeatureConcat#614:
    %[13477] : torch.float32(1, 768, 40, 40) = ./aten::cat#2()
  Sequential#615/Conv2d#3:
      %[Sequential#615/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13477])
  Sequential#615/BatchNorm2d#4:
      %[Sequential#615/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#615/8])
  Sequential#615/SiLU#5:
      %[13478] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#615/9])
  Sequential#616/Conv2d#3:
      %[Sequential#616/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13478])
  Sequential#616/BatchNorm2d#4:
      %[Sequential#616/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#616/8])
  Sequential#616/SiLU#5:
      %[13479] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#616/9])
  Sequential#618/Conv2d#3:
      %[Sequential#618/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13478])
  Sequential#618/BatchNorm2d#4:
      %[Sequential#618/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#618/8])
  Sequential#618/SiLU#5:
      %[13481] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#618/9])
  Sequential#619/Conv2d#3:
      %[Sequential#619/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13481])
  Sequential#619/BatchNorm2d#4:
      %[Sequential#619/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#619/8])
  Sequential#619/SiLU#5:
      %[13482] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#619/9])
  Sequential#620/Conv2d#3:
      %[Sequential#620/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13482])
  Sequential#620/BatchNorm2d#4:
      %[Sequential#620/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#620/8])
  Sequential#620/SiLU#5:
      %[13483] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#620/9])
  Sequential#621/Conv2d#3:
      %[Sequential#621/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13483])
  Sequential#621/BatchNorm2d#4:
      %[Sequential#621/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#621/8])
  Sequential#621/SiLU#5:
      %[13484] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#621/9])
  Sequential#622/Conv2d#3:
      %[Sequential#622/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13484])
  Sequential#622/BatchNorm2d#4:
      %[Sequential#622/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#622/8])
  Sequential#622/SiLU#5:
      %[13485] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#622/9])
  Sequential#623/Conv2d#3:
      %[Sequential#623/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13485])
  Sequential#623/BatchNorm2d#4:
      %[Sequential#623/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#623/8])
  Sequential#623/SiLU#5:
      %[13486] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#623/9])
  FeatureConcat#624:
    %[13487] : torch.float32(1, 768, 40, 40) = ./aten::cat#2()
  Sequential#625/Conv2d#3:
      %[Sequential#625/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13487])
  Sequential#625/BatchNorm2d#4:
      %[Sequential#625/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#625/8])
  Sequential#625/SiLU#5:
      %[13488] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#625/9])
  Sequential#626/Conv2d#3:
      %[Sequential#626/8] : torch.float32(1, 256, 40, 40) = ./aten::_convolution#20(%[13488])
  Sequential#626/BatchNorm2d#4:
      %[Sequential#626/9] : torch.float32(1, 256, 40, 40) = ./aten::batch_norm#8(%[Sequential#626/8])
  Sequential#626/SiLU#5:
      %[13489] : torch.float32(1, 256, 40, 40) = ./aten::silu#0(%[Sequential#626/9])
  Upsample#627:
    %[13490] : torch.float32(1, 256, 80, 80) = ./aten::upsample_nearest2d#4(%[13489])
  Sequential#629/Conv2d#3:
      %[Sequential#629/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13427])
  Sequential#629/BatchNorm2d#4:
      %[Sequential#629/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#629/8])
  Sequential#629/SiLU#5:
      %[13492] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#629/9])
  FeatureConcat#630:
    %[13493] : torch.float32(1, 512, 80, 80) = ./aten::cat#2()
  Sequential#631/Conv2d#3:
      %[Sequential#631/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13493])
  Sequential#631/BatchNorm2d#4:
      %[Sequential#631/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#631/8])
  Sequential#631/SiLU#5:
      %[13494] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#631/9])
  Sequential#632/Conv2d#3:
      %[Sequential#632/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13494])
  Sequential#632/BatchNorm2d#4:
      %[Sequential#632/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#632/8])
  Sequential#632/SiLU#5:
      %[13495] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#632/9])
  Sequential#634/Conv2d#3:
      %[Sequential#634/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13494])
  Sequential#634/BatchNorm2d#4:
      %[Sequential#634/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#634/8])
  Sequential#634/SiLU#5:
      %[13497] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#634/9])
  Sequential#635/Conv2d#3:
      %[Sequential#635/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13497])
  Sequential#635/BatchNorm2d#4:
      %[Sequential#635/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#635/8])
  Sequential#635/SiLU#5:
      %[13498] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#635/9])
  Sequential#636/Conv2d#3:
      %[Sequential#636/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13498])
  Sequential#636/BatchNorm2d#4:
      %[Sequential#636/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#636/8])
  Sequential#636/SiLU#5:
      %[13499] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#636/9])
  Sequential#637/Conv2d#3:
      %[Sequential#637/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13499])
  Sequential#637/BatchNorm2d#4:
      %[Sequential#637/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#637/8])
  Sequential#637/SiLU#5:
      %[13500] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#637/9])
  Sequential#638/Conv2d#3:
      %[Sequential#638/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13500])
  Sequential#638/BatchNorm2d#4:
      %[Sequential#638/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#638/8])
  Sequential#638/SiLU#5:
      %[13501] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#638/9])
  Sequential#639/Conv2d#3:
      %[Sequential#639/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13501])
  Sequential#639/BatchNorm2d#4:
      %[Sequential#639/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#639/8])
  Sequential#639/SiLU#5:
      %[13502] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#639/9])
  FeatureConcat#640:
    %[13503] : torch.float32(1, 512, 80, 80) = ./aten::cat#2()
  Sequential#641/Conv2d#3:
      %[Sequential#641/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13503])
  Sequential#641/BatchNorm2d#4:
      %[Sequential#641/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#641/8])
  Sequential#641/SiLU#5:
      %[13504] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#641/9])
  Sequential#642/Conv2d#3:
      %[Sequential#642/8] : torch.float32(1, 128, 80, 80) = ./aten::_convolution#20(%[13504])
  Sequential#642/BatchNorm2d#4:
      %[Sequential#642/9] : torch.float32(1, 128, 80, 80) = ./aten::batch_norm#8(%[Sequential#642/8])
  Sequential#642/SiLU#5:
      %[13505] : torch.float32(1, 128, 80, 80) = ./aten::silu#0(%[Sequential#642/9])
  Upsample#643:
    %[13506] : torch.float32(1, 128, 160, 160) = ./aten::upsample_nearest2d#4(%[13505])
  Sequential#645/Conv2d#3:
      %[Sequential#645/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13400])
  Sequential#645/BatchNorm2d#4:
      %[Sequential#645/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#645/8])
  Sequential#645/SiLU#5:
      %[13508] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#645/9])
  FeatureConcat#646:
    %[13509] : torch.float32(1, 256, 160, 160) = ./aten::cat#2()
  Sequential#647/Conv2d#3:
      %[Sequential#647/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13509])
  Sequential#647/BatchNorm2d#4:
      %[Sequential#647/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#647/8])
  Sequential#647/SiLU#5:
      %[13510] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#647/9])
  Sequential#648/Conv2d#3:
      %[Sequential#648/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13510])
  Sequential#648/BatchNorm2d#4:
      %[Sequential#648/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#648/8])
  Sequential#648/SiLU#5:
      %[13511] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#648/9])
  Sequential#650/Conv2d#3:
      %[Sequential#650/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13510])
  Sequential#650/BatchNorm2d#4:
      %[Sequential#650/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#650/8])
  Sequential#650/SiLU#5:
      %[13513] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#650/9])
  Sequential#651/Conv2d#3:
      %[Sequential#651/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13513])
  Sequential#651/BatchNorm2d#4:
      %[Sequential#651/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#651/8])
  Sequential#651/SiLU#5:
      %[13514] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#651/9])
  Sequential#652/Conv2d#3:
      %[Sequential#652/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13514])
  Sequential#652/BatchNorm2d#4:
      %[Sequential#652/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#652/8])
  Sequential#652/SiLU#5:
      %[13515] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#652/9])
  Sequential#653/Conv2d#3:
      %[Sequential#653/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13515])
  Sequential#653/BatchNorm2d#4:
      %[Sequential#653/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#653/8])
  Sequential#653/SiLU#5:
      %[13516] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#653/9])
  Sequential#654/Conv2d#3:
      %[Sequential#654/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13516])
  Sequential#654/BatchNorm2d#4:
      %[Sequential#654/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#654/8])
  Sequential#654/SiLU#5:
      %[13517] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#654/9])
  Sequential#655/Conv2d#3:
      %[Sequential#655/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13517])
  Sequential#655/BatchNorm2d#4:
      %[Sequential#655/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#655/8])
  Sequential#655/SiLU#5:
      %[13518] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#655/9])
  FeatureConcat#656:
    %[13519] : torch.float32(1, 256, 160, 160) = ./aten::cat#2()
  Sequential#657/Conv2d#3:
      %[Sequential#657/8] : torch.float32(1, 128, 160, 160) = ./aten::_convolution#20(%[13519])
  Sequential#657/BatchNorm2d#4:
      %[Sequential#657/9] : torch.float32(1, 128, 160, 160) = ./aten::batch_norm#8(%[Sequential#657/8])
  Sequential#657/SiLU#5:
      %[13520] : torch.float32(1, 128, 160, 160) = ./aten::silu#0(%[Sequential#657/9])
  Sequential#658/Conv2d#3:
      %[Sequential#658/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13520])
  Sequential#658/BatchNorm2d#4:
      %[Sequential#658/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#658/8])
  Sequential#658/SiLU#5:
      %[13521] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#658/9])
  FeatureConcat#659:
    %[13522] : torch.float32(1, 512, 80, 80) = ./aten::cat#2()
  Sequential#660/Conv2d#3:
      %[Sequential#660/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13522])
  Sequential#660/BatchNorm2d#4:
      %[Sequential#660/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#660/8])
  Sequential#660/SiLU#5:
      %[13523] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#660/9])
  Sequential#661/Conv2d#3:
      %[Sequential#661/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13523])
  Sequential#661/BatchNorm2d#4:
      %[Sequential#661/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#661/8])
  Sequential#661/SiLU#5:
      %[13524] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#661/9])
  Sequential#663/Conv2d#3:
      %[Sequential#663/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13523])
  Sequential#663/BatchNorm2d#4:
      %[Sequential#663/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#663/8])
  Sequential#663/SiLU#5:
      %[13526] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#663/9])
  Sequential#664/Conv2d#3:
      %[Sequential#664/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13526])
  Sequential#664/BatchNorm2d#4:
      %[Sequential#664/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#664/8])
  Sequential#664/SiLU#5:
      %[13527] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#664/9])
  Sequential#665/Conv2d#3:
      %[Sequential#665/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13527])
  Sequential#665/BatchNorm2d#4:
      %[Sequential#665/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#665/8])
  Sequential#665/SiLU#5:
      %[13528] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#665/9])
  Sequential#666/Conv2d#3:
      %[Sequential#666/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13528])
  Sequential#666/BatchNorm2d#4:
      %[Sequential#666/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#666/8])
  Sequential#666/SiLU#5:
      %[13529] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#666/9])
  Sequential#667/Conv2d#3:
      %[Sequential#667/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13529])
  Sequential#667/BatchNorm2d#4:
      %[Sequential#667/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#667/8])
  Sequential#667/SiLU#5:
      %[13530] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#667/9])
  Sequential#668/Conv2d#3:
      %[Sequential#668/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13530])
  Sequential#668/BatchNorm2d#4:
      %[Sequential#668/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#668/8])
  Sequential#668/SiLU#5:
      %[13531] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#668/9])
  FeatureConcat#669:
    %[13532] : torch.float32(1, 512, 80, 80) = ./aten::cat#2()
  Sequential#670/Conv2d#3:
      %[Sequential#670/8] : torch.float32(1, 256, 80, 80) = ./aten::_convolution#20(%[13532])
  Sequential#670/BatchNorm2d#4:
      %[Sequential#670/9] : torch.float32(1, 256, 80, 80) = ./aten::batch_norm#8(%[Sequential#670/8])
  Sequential#670/SiLU#5:
      %[13533] : torch.float32(1, 256, 80, 80) = ./aten::silu#0(%[Sequential#670/9])
  Sequential#671/Conv2d#3:
      %[Sequential#671/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13533])
  Sequential#671/BatchNorm2d#4:
      %[Sequential#671/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#671/8])
  Sequential#671/SiLU#5:
      %[13534] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#671/9])
  FeatureConcat#672:
    %[13535] : torch.float32(1, 768, 40, 40) = ./aten::cat#2()
  Sequential#673/Conv2d#3:
      %[Sequential#673/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13535])
  Sequential#673/BatchNorm2d#4:
      %[Sequential#673/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#673/8])
  Sequential#673/SiLU#5:
      %[13536] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#673/9])
  Sequential#674/Conv2d#3:
      %[Sequential#674/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13536])
  Sequential#674/BatchNorm2d#4:
      %[Sequential#674/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#674/8])
  Sequential#674/SiLU#5:
      %[13537] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#674/9])
  Sequential#676/Conv2d#3:
      %[Sequential#676/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13536])
  Sequential#676/BatchNorm2d#4:
      %[Sequential#676/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#676/8])
  Sequential#676/SiLU#5:
      %[13539] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#676/9])
  Sequential#677/Conv2d#3:
      %[Sequential#677/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13539])
  Sequential#677/BatchNorm2d#4:
      %[Sequential#677/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#677/8])
  Sequential#677/SiLU#5:
      %[13540] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#677/9])
  Sequential#678/Conv2d#3:
      %[Sequential#678/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13540])
  Sequential#678/BatchNorm2d#4:
      %[Sequential#678/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#678/8])
  Sequential#678/SiLU#5:
      %[13541] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#678/9])
  Sequential#679/Conv2d#3:
      %[Sequential#679/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13541])
  Sequential#679/BatchNorm2d#4:
      %[Sequential#679/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#679/8])
  Sequential#679/SiLU#5:
      %[13542] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#679/9])
  Sequential#680/Conv2d#3:
      %[Sequential#680/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13542])
  Sequential#680/BatchNorm2d#4:
      %[Sequential#680/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#680/8])
  Sequential#680/SiLU#5:
      %[13543] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#680/9])
  Sequential#681/Conv2d#3:
      %[Sequential#681/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13543])
  Sequential#681/BatchNorm2d#4:
      %[Sequential#681/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#681/8])
  Sequential#681/SiLU#5:
      %[13544] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#681/9])
  FeatureConcat#682:
    %[13545] : torch.float32(1, 768, 40, 40) = ./aten::cat#2()
  Sequential#683/Conv2d#3:
      %[Sequential#683/8] : torch.float32(1, 384, 40, 40) = ./aten::_convolution#20(%[13545])
  Sequential#683/BatchNorm2d#4:
      %[Sequential#683/9] : torch.float32(1, 384, 40, 40) = ./aten::batch_norm#8(%[Sequential#683/8])
  Sequential#683/SiLU#5:
      %[13546] : torch.float32(1, 384, 40, 40) = ./aten::silu#0(%[Sequential#683/9])
  Sequential#684/Conv2d#3:
      %[Sequential#684/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13546])
  Sequential#684/BatchNorm2d#4:
      %[Sequential#684/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#684/8])
  Sequential#684/SiLU#5:
      %[13547] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#684/9])
  FeatureConcat#685:
    %[13548] : torch.float32(1, 1024, 20, 20) = ./aten::cat#2()
  Sequential#686/Conv2d#3:
      %[Sequential#686/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13548])
  Sequential#686/BatchNorm2d#4:
      %[Sequential#686/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#686/8])
  Sequential#686/SiLU#5:
      %[13549] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#686/9])
  Sequential#687/Conv2d#3:
      %[Sequential#687/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13549])
  Sequential#687/BatchNorm2d#4:
      %[Sequential#687/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#687/8])
  Sequential#687/SiLU#5:
      %[13550] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#687/9])
  Sequential#689/Conv2d#3:
      %[Sequential#689/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13549])
  Sequential#689/BatchNorm2d#4:
      %[Sequential#689/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#689/8])
  Sequential#689/SiLU#5:
      %[13552] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#689/9])
  Sequential#690/Conv2d#3:
      %[Sequential#690/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13552])
  Sequential#690/BatchNorm2d#4:
      %[Sequential#690/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#690/8])
  Sequential#690/SiLU#5:
      %[13553] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#690/9])
  Sequential#691/Conv2d#3:
      %[Sequential#691/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13553])
  Sequential#691/BatchNorm2d#4:
      %[Sequential#691/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#691/8])
  Sequential#691/SiLU#5:
      %[13554] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#691/9])
  Sequential#692/Conv2d#3:
      %[Sequential#692/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13554])
  Sequential#692/BatchNorm2d#4:
      %[Sequential#692/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#692/8])
  Sequential#692/SiLU#5:
      %[13555] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#692/9])
  Sequential#693/Conv2d#3:
      %[Sequential#693/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13555])
  Sequential#693/BatchNorm2d#4:
      %[Sequential#693/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#693/8])
  Sequential#693/SiLU#5:
      %[13556] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#693/9])
  Sequential#694/Conv2d#3:
      %[Sequential#694/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13556])
  Sequential#694/BatchNorm2d#4:
      %[Sequential#694/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#694/8])
  Sequential#694/SiLU#5:
      %[13557] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#694/9])
  FeatureConcat#695:
    %[13558] : torch.float32(1, 1024, 20, 20) = ./aten::cat#2()
  Sequential#696/Conv2d#3:
      %[Sequential#696/8] : torch.float32(1, 512, 20, 20) = ./aten::_convolution#20(%[13558])
  Sequential#696/BatchNorm2d#4:
      %[Sequential#696/9] : torch.float32(1, 512, 20, 20) = ./aten::batch_norm#8(%[Sequential#696/8])
  Sequential#696/SiLU#5:
      %[13559] : torch.float32(1, 512, 20, 20) = ./aten::silu#0(%[Sequential#696/9])
  Sequential#706/Conv2d#3:
      %[Sequential#706/8] : torch.float32(1, 256, 160, 160) = ./aten::_convolution#20(%[13520])
  Sequential#706/BatchNorm2d#4:
      %[Sequential#706/9] : torch.float32(1, 256, 160, 160) = ./aten::batch_norm#8(%[Sequential#706/8])
  Sequential#706/SiLU#5:
      %[13569] : torch.float32(1, 256, 160, 160) = ./aten::silu#0(%[Sequential#706/9])
  ShiftChannel#707:
    %[./2] : torch.float32(1, 256, 160, 160) = ./aten::expand_as#0(%[13569])
    %[13570] : torch.float32(1, 256, 160, 160) = ./aten::add#2(%[./2], %[13569])
  Sequential#708/Conv2d#1:
      %[13571] : torch.float32(1, 255, 160, 160) = ./aten::_convolution#20(%[13570])
  ControlChannel#709:
    %[./2] : torch.float32(1, 255, 160, 160) = ./aten::expand_as#0(%[13571])
    %[13572] : torch.float32(1, 255, 160, 160) = ./aten::mul#1(%[./2], %[13571])
  YOLOLayer#710:
    %[./2] : 1 = ./aten::size#1(%[13572])
    %[./5] : torch.int32() = ./aten::Int#3()
    %[./16] : torch.int32() = ./aten::Int#5()
    %[./19] : torch.int32() = ./aten::Int#7()
    %[./24] : torch.float32(1, 3, 85, 160, 160) = ./aten::view#11(%[13572])
    %[./31] : torch.float32(1, 3, 160, 160, 85) = ./aten::permute#18(%[./24])
    %[13573] : torch.float32(1, 3, 160, 160, 85) = ./aten::contiguous#20(%[./31])
  Sequential#712/Conv2d#3:
      %[Sequential#712/8] : torch.float32(1, 512, 80, 80) = ./aten::_convolution#20(%[13533])
  Sequential#712/BatchNorm2d#4:
      %[Sequential#712/9] : torch.float32(1, 512, 80, 80) = ./aten::batch_norm#8(%[Sequential#712/8])
  Sequential#712/SiLU#5:
      %[13575] : torch.float32(1, 512, 80, 80) = ./aten::silu#0(%[Sequential#712/9])
  ShiftChannel#713:
    %[./2] : torch.float32(1, 512, 80, 80) = ./aten::expand_as#0(%[13575])
    %[13576] : torch.float32(1, 512, 80, 80) = ./aten::add#2(%[./2], %[13575])
  Sequential#714/Conv2d#1:
      %[13577] : torch.float32(1, 255, 80, 80) = ./aten::_convolution#20(%[13576])
  ControlChannel#715:
    %[./2] : torch.float32(1, 255, 80, 80) = ./aten::expand_as#0(%[13577])
    %[13578] : torch.float32(1, 255, 80, 80) = ./aten::mul#1(%[./2], %[13577])
  YOLOLayer#716:
    %[./2] : 1 = ./aten::size#1(%[13578])
    %[./5] : torch.int32() = ./aten::Int#3()
    %[./16] : torch.int32() = ./aten::Int#5()
    %[./19] : torch.int32() = ./aten::Int#7()
    %[./24] : torch.float32(1, 3, 85, 80, 80) = ./aten::view#11(%[13578])
    %[./31] : torch.float32(1, 3, 80, 80, 85) = ./aten::permute#18(%[./24])
    %[13579] : torch.float32(1, 3, 80, 80, 85) = ./aten::contiguous#20(%[./31])
  Sequential#718/Conv2d#3:
      %[Sequential#718/8] : torch.float32(1, 768, 40, 40) = ./aten::_convolution#20(%[13546])
  Sequential#718/BatchNorm2d#4:
      %[Sequential#718/9] : torch.float32(1, 768, 40, 40) = ./aten::batch_norm#8(%[Sequential#718/8])
  Sequential#718/SiLU#5:
      %[13581] : torch.float32(1, 768, 40, 40) = ./aten::silu#0(%[Sequential#718/9])
  ShiftChannel#719:
    %[./2] : torch.float32(1, 768, 40, 40) = ./aten::expand_as#0(%[13581])
    %[13582] : torch.float32(1, 768, 40, 40) = ./aten::add#2(%[./2], %[13581])
  Sequential#720/Conv2d#1:
      %[13583] : torch.float32(1, 255, 40, 40) = ./aten::_convolution#20(%[13582])
  ControlChannel#721:
    %[./2] : torch.float32(1, 255, 40, 40) = ./aten::expand_as#0(%[13583])
    %[13584] : torch.float32(1, 255, 40, 40) = ./aten::mul#1(%[./2], %[13583])
  YOLOLayer#722:
    %[./2] : 1 = ./aten::size#1(%[13584])
    %[./5] : torch.int32() = ./aten::Int#3()
    %[./16] : torch.int32() = ./aten::Int#5()
    %[./19] : torch.int32() = ./aten::Int#7()
    %[./24] : torch.float32(1, 3, 85, 40, 40) = ./aten::view#11(%[13584])
    %[./31] : torch.float32(1, 3, 40, 40, 85) = ./aten::permute#18(%[./24])
    %[13585] : torch.float32(1, 3, 40, 40, 85) = ./aten::contiguous#20(%[./31])
  Sequential#724/Conv2d#3:
      %[Sequential#724/8] : torch.float32(1, 1024, 20, 20) = ./aten::_convolution#20(%[13559])
  Sequential#724/BatchNorm2d#4:
      %[Sequential#724/9] : torch.float32(1, 1024, 20, 20) = ./aten::batch_norm#8(%[Sequential#724/8])
  Sequential#724/SiLU#5:
      %[13587] : torch.float32(1, 1024, 20, 20) = ./aten::silu#0(%[Sequential#724/9])
  ShiftChannel#725:
    %[./2] : torch.float32(1, 1024, 20, 20) = ./aten::expand_as#0(%[13587])
    %[13588] : torch.float32(1, 1024, 20, 20) = ./aten::add#2(%[./2], %[13587])
  Sequential#726/Conv2d#1:
      %[13589] : torch.float32(1, 255, 20, 20) = ./aten::_convolution#20(%[13588])
  ControlChannel#727:
    %[./2] : torch.float32(1, 255, 20, 20) = ./aten::expand_as#0(%[13589])
    %[13590] : torch.float32(1, 255, 20, 20) = ./aten::mul#1(%[./2], %[13589])
  YOLOLayer#728:
    %[./2] : 1 = ./aten::size#1(%[13590])
    %[./5] : torch.int32() = ./aten::Int#3()
    %[./16] : torch.int32() = ./aten::Int#5()
    %[./19] : torch.int32() = ./aten::Int#7()
    %[./24] : torch.float32(1, 3, 85, 20, 20) = ./aten::view#11(%[13590])
    %[./31] : torch.float32(1, 3, 20, 20, 85) = ./aten::permute#18(%[./24])
    %[13591] : torch.float32(1, 3, 20, 20, 85) = ./aten::contiguous#20(%[./31])
  %[10217] : [torch.float32(1, 3, 160, 160, 85), torch.float32(1, 3, 80, 80, 85), torch.float32(1, 3, 40, 40, 85), torch.float32(1, 3, 20, 20, 85)] = prim::ListConstruct#729(%[13573], %[13579], %[13585], %[13591])
  return(%[10217] : [torch.float32(1, 3, 160, 160, 85), torch.float32(1, 3, 80, 80, 85), torch.float32(1, 3, 40, 40, 85), torch.float32(1, 3, 20, 20, 85)])

I checked that all of the operators in the model are supported by neuron. However, while checking one of the tutorials it says that:

Inspecting the model, we discover that there are many aten::slice operations in some submodules called YoloLayer. Although these operations are supported by the neuron-cc compiler, they are not going to run efficiently on the Inferentia hardware, and it makes me ask if there are some other operations from this nodel that do not run efficiently in the hardware?

Thanks for the help

aws-taylor commented 2 years ago

Hello @alejoGT1202, We inspected your model and noticed that there are a few places with the YoloLayer operator that utilize in-place replacement, which limits our ability to effectively trace the model - this manifests as a WARNING in the logs.

 # Problem in place code
#io[..., :2] = (io[..., :2] * 2. - 0.5 + self.grid)
#io[..., 2:4] = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
#io[..., :4] *= self.stride
#return io.view(bs, -1, [self.no](http://self.no/)), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]

If you replace this with the following, we expect you'll see accurate results:

# Fixed non inline code
a = (io[..., :2] * 2. - 0.5 + self.grid)
b = (io[..., 2:4] * 2) ** 2 * self.anchor_wh
last_dim = len(a.shape) - 1
ab = torch.cat([a,b],dim=last_dim) * self.stride
out = torch.cat([ab,io[..., 4:]],dim=last_dim)
return out.view(bs, -1, [self.no](http://self.no/)), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]

More generally, I would encourage you to benchmark and profile your model - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/getting-started-tensorboard-neuron-plugin.html. Such profiling can expose places in your model where the hardware is not being utilized effectively.

I would also encourage you to ensure you're running the most recent version of Neuron software - our team is constantly working to improve operator support, compiled model quality, and profiling tools.

aws-diamant commented 2 years ago

Resolving per suggested solution above. @alejoGT1202 , please feel free to re-open is we can help with anything else.

alejoGT1202 commented 2 years ago

@aws-diamant @aws-taylor I was able to convert the model with the modification suggested. However, I'm not getting the same accuracy compared to the model on GPU. I tried different combinations for neuron-cc compile from the ones specified here. Is there any other approach I should try to get the same performance as the one that runs on GPU?

Thanks for the help.