dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.91k stars 2.99k forks source link

imagenet: never ending "Autotuning Reformat" on Jetson Orin Nano Dev Kit #1630

Closed youtalk closed 1 year ago

youtalk commented 1 year ago

I've tested the imagenet on the Jetson Orin Nano Dev Kit but it doesn't work correctly. The log shows below. I think it is a TensorRT error. Do you have any idea?

$ imagenet data/images/orange_0.jpg data/images/test/output_0.jpg
[video]  created imageLoader from file:///home/youtalk/src/jetson-inference/data/images/orange_0.jpg
------------------------------------------------
imageLoader video options:
------------------------------------------------
  -- URI: file:///home/youtalk/src/jetson-inference/data/images/orange_0.jpg
     - protocol:  file
     - location:  data/images/orange_0.jpg
     - extension: jpg
  -- deviceType: file
  -- ioType:     input
  -- codec:      unknown
  -- codecType:  v4l2
  -- frameRate:  0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- loop:       0
------------------------------------------------
[video]  created imageWriter from file:///home/youtalk/src/jetson-inference/data/images/test/output_0.jpg
------------------------------------------------
imageWriter video options:
------------------------------------------------
  -- URI: file:///home/youtalk/src/jetson-inference/data/images/test/output_0.jpg
     - protocol:  file
     - location:  data/images/test/output_0.jpg
     - extension: jpg
  -- deviceType: file
  -- ioType:     output
  -- codec:      unknown
  -- codecType:  v4l2
  -- frameRate:  0
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
------------------------------------------------
[OpenGL] glDisplay -- X screen 0 resolution:  1920x1080
[OpenGL] glDisplay -- X window resolution:    1920x1080
[OpenGL] glDisplay -- display device initialized (1920x1080)
[video]  created glDisplay from display://0
------------------------------------------------
glDisplay video options:
------------------------------------------------
  -- URI: display://0
     - protocol:  display
     - location:  0
  -- deviceType: display
  -- ioType:     output
  -- width:      1920
  -- height:     1080
  -- frameRate:  0
  -- numBuffers: 4
  -- zeroCopy:   true
------------------------------------------------

imageNet -- loading classification network model from:
         -- prototxt     networks/Googlenet/googlenet.prototxt
         -- model        networks/Googlenet/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   1

[TRT]    TensorRT version 8.5.2
[TRT]    loading NVIDIA plugins...
[TRT]    Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT]    Registered plugin creator - ::BatchTilePlugin_TRT version 1
[TRT]    Registered plugin creator - ::Clip_TRT version 1
[TRT]    Registered plugin creator - ::CoordConvAC version 1
[TRT]    Registered plugin creator - ::CropAndResizeDynamic version 1
[TRT]    Registered plugin creator - ::CropAndResize version 1
[TRT]    Registered plugin creator - ::DecodeBbox3DPlugin version 1
[TRT]    Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    Registered plugin creator - ::GenerateDetection_TRT version 1
[TRT]    Registered plugin creator - ::GridAnchor_TRT version 1
[TRT]    Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT]    Registered plugin creator - ::GroupNorm version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 2
[TRT]    Registered plugin creator - ::LayerNorm version 1
[TRT]    Registered plugin creator - ::LReLU_TRT version 1
[TRT]    Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[TRT]    Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[TRT]    Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[TRT]    Registered plugin creator - ::NMSDynamic_TRT version 1
[TRT]    Registered plugin creator - ::NMS_TRT version 1
[TRT]    Registered plugin creator - ::Normalize_TRT version 1
[TRT]    Registered plugin creator - ::PillarScatterPlugin version 1
[TRT]    Registered plugin creator - ::PriorBox_TRT version 1
[TRT]    Registered plugin creator - ::ProposalDynamic version 1
[TRT]    Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT]    Registered plugin creator - ::Proposal version 1
[TRT]    Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::Region_TRT version 1
[TRT]    Registered plugin creator - ::Reorg_TRT version 1
[TRT]    Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT]    Registered plugin creator - ::ROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::RPROI_TRT version 1
[TRT]    Registered plugin creator - ::ScatterND version 1
[TRT]    Registered plugin creator - ::SeqLen2Spatial version 1
[TRT]    Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT]    Registered plugin creator - ::SplitGeLU version 1
[TRT]    Registered plugin creator - ::Split version 1
[TRT]    Registered plugin creator - ::VoxelGeneratorPlugin version 1
[TRT]    detected model format - caffe  (extension '.caffemodel')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    [MemUsageChange] Init CUDA: CPU +215, GPU +0, now: CPU 258, GPU 2773 (MiB)
[TRT]    Trying to load shared library libnvinfer_builder_resource.so.8.5.2
[TRT]    Loaded shared library libnvinfer_builder_resource.so.8.5.2
[TRT]    [MemUsageChange] Init builder kernel library: CPU +302, GPU +430, now: CPU 582, GPU 3225 (MiB)
[TRT]    native precisions detected for GPU:  FP32, FP16, INT8
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    could not find engine cache /usr/local/bin/networks/Googlenet/bvlc_googlenet.caffemodel.1.1.8502.GPU.FP16.engine
[TRT]    cache file invalid, profiling network model on device GPU
[TRT]    [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 288, GPU 3226 (MiB)
[TRT]    Trying to load shared library libnvinfer_builder_resource.so.8.5.2
[TRT]    Loaded shared library libnvinfer_builder_resource.so.8.5.2
[TRT]    [MemUsageChange] Init builder kernel library: CPU +295, GPU +27, now: CPU 583, GPU 3260 (MiB)
[TRT]    The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
[TRT]    device GPU, loading /usr/local/bin/networks/Googlenet/googlenet.prototxt /usr/local/bin/networks/Googlenet/bvlc_googlenet.caffemodel
[TRT]    device GPU, configuring network builder
[TRT]    device GPU, building FP16:  ON
[TRT]    device GPU, building INT8:  OFF
[TRT]    device GPU, workspace size: 33554432
[TRT]    device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT]    Original: 141 layers
[TRT]    After dead-layer removal: 141 layers
[TRT]    Applying generic optimizations to the graph for inference.
[TRT]    Running: FCToConvTransform on loss3/classifier
[TRT]    Convert layer type of loss3/classifier from FULLY_CONNECTED to CONVOLUTION
[TRT]    Running: ShuffleErasure on shuffle_between_pool5/7x7_s1_and_loss3/classifier
[TRT]    Removing shuffle_between_pool5/7x7_s1_and_loss3/classifier
[TRT]    Applying ScaleNodes fusions.
[TRT]    After scale fusion: 141 layers
[TRT]    Running: ConvReluFusion on conv1/7x7_s2
[TRT]    ConvReluFusion: Fusing conv1/7x7_s2 with conv1/relu_7x7
[TRT]    Running: ConvReluFusion on conv2/3x3_reduce
[TRT]    ConvReluFusion: Fusing conv2/3x3_reduce with conv2/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on conv2/3x3
[TRT]    ConvReluFusion: Fusing conv2/3x3 with conv2/relu_3x3
[TRT]    Running: ConvReluFusion on inception_3a/1x1
[TRT]    ConvReluFusion: Fusing inception_3a/1x1 with inception_3a/relu_1x1
[TRT]    Running: ConvReluFusion on inception_3a/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_3a/3x3_reduce with inception_3a/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_3a/3x3
[TRT]    ConvReluFusion: Fusing inception_3a/3x3 with inception_3a/relu_3x3
[TRT]    Running: ConvReluFusion on inception_3a/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_3a/5x5_reduce with inception_3a/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_3a/5x5
[TRT]    ConvReluFusion: Fusing inception_3a/5x5 with inception_3a/relu_5x5
[TRT]    Running: ConvReluFusion on inception_3a/pool_proj
[TRT]    ConvReluFusion: Fusing inception_3a/pool_proj with inception_3a/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_3b/1x1
[TRT]    ConvReluFusion: Fusing inception_3b/1x1 with inception_3b/relu_1x1
[TRT]    Running: ConvReluFusion on inception_3b/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_3b/3x3_reduce with inception_3b/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_3b/3x3
[TRT]    ConvReluFusion: Fusing inception_3b/3x3 with inception_3b/relu_3x3
[TRT]    Running: ConvReluFusion on inception_3b/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_3b/5x5_reduce with inception_3b/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_3b/5x5
[TRT]    ConvReluFusion: Fusing inception_3b/5x5 with inception_3b/relu_5x5
[TRT]    Running: ConvReluFusion on inception_3b/pool_proj
[TRT]    ConvReluFusion: Fusing inception_3b/pool_proj with inception_3b/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_4a/1x1
[TRT]    ConvReluFusion: Fusing inception_4a/1x1 with inception_4a/relu_1x1
[TRT]    Running: ConvReluFusion on inception_4a/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_4a/3x3_reduce with inception_4a/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_4a/3x3
[TRT]    ConvReluFusion: Fusing inception_4a/3x3 with inception_4a/relu_3x3
[TRT]    Running: ConvReluFusion on inception_4a/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_4a/5x5_reduce with inception_4a/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_4a/5x5
[TRT]    ConvReluFusion: Fusing inception_4a/5x5 with inception_4a/relu_5x5
[TRT]    Running: ConvReluFusion on inception_4a/pool_proj
[TRT]    ConvReluFusion: Fusing inception_4a/pool_proj with inception_4a/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_4b/1x1
[TRT]    ConvReluFusion: Fusing inception_4b/1x1 with inception_4b/relu_1x1
[TRT]    Running: ConvReluFusion on inception_4b/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_4b/3x3_reduce with inception_4b/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_4b/3x3
[TRT]    ConvReluFusion: Fusing inception_4b/3x3 with inception_4b/relu_3x3
[TRT]    Running: ConvReluFusion on inception_4b/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_4b/5x5_reduce with inception_4b/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_4b/5x5
[TRT]    ConvReluFusion: Fusing inception_4b/5x5 with inception_4b/relu_5x5
[TRT]    Running: ConvReluFusion on inception_4b/pool_proj
[TRT]    ConvReluFusion: Fusing inception_4b/pool_proj with inception_4b/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_4c/1x1
[TRT]    ConvReluFusion: Fusing inception_4c/1x1 with inception_4c/relu_1x1
[TRT]    Running: ConvReluFusion on inception_4c/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_4c/3x3_reduce with inception_4c/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_4c/3x3
[TRT]    ConvReluFusion: Fusing inception_4c/3x3 with inception_4c/relu_3x3
[TRT]    Running: ConvReluFusion on inception_4c/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_4c/5x5_reduce with inception_4c/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_4c/5x5
[TRT]    ConvReluFusion: Fusing inception_4c/5x5 with inception_4c/relu_5x5
[TRT]    Running: ConvReluFusion on inception_4c/pool_proj
[TRT]    ConvReluFusion: Fusing inception_4c/pool_proj with inception_4c/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_4d/1x1
[TRT]    ConvReluFusion: Fusing inception_4d/1x1 with inception_4d/relu_1x1
[TRT]    Running: ConvReluFusion on inception_4d/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_4d/3x3_reduce with inception_4d/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_4d/3x3
[TRT]    ConvReluFusion: Fusing inception_4d/3x3 with inception_4d/relu_3x3
[TRT]    Running: ConvReluFusion on inception_4d/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_4d/5x5_reduce with inception_4d/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_4d/5x5
[TRT]    ConvReluFusion: Fusing inception_4d/5x5 with inception_4d/relu_5x5
[TRT]    Running: ConvReluFusion on inception_4d/pool_proj
[TRT]    ConvReluFusion: Fusing inception_4d/pool_proj with inception_4d/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_4e/1x1
[TRT]    ConvReluFusion: Fusing inception_4e/1x1 with inception_4e/relu_1x1
[TRT]    Running: ConvReluFusion on inception_4e/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_4e/3x3_reduce with inception_4e/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_4e/3x3
[TRT]    ConvReluFusion: Fusing inception_4e/3x3 with inception_4e/relu_3x3
[TRT]    Running: ConvReluFusion on inception_4e/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_4e/5x5_reduce with inception_4e/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_4e/5x5
[TRT]    ConvReluFusion: Fusing inception_4e/5x5 with inception_4e/relu_5x5
[TRT]    Running: ConvReluFusion on inception_4e/pool_proj
[TRT]    ConvReluFusion: Fusing inception_4e/pool_proj with inception_4e/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_5a/1x1
[TRT]    ConvReluFusion: Fusing inception_5a/1x1 with inception_5a/relu_1x1
[TRT]    Running: ConvReluFusion on inception_5a/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_5a/3x3_reduce with inception_5a/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_5a/3x3
[TRT]    ConvReluFusion: Fusing inception_5a/3x3 with inception_5a/relu_3x3
[TRT]    Running: ConvReluFusion on inception_5a/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_5a/5x5_reduce with inception_5a/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_5a/5x5
[TRT]    ConvReluFusion: Fusing inception_5a/5x5 with inception_5a/relu_5x5
[TRT]    Running: ConvReluFusion on inception_5a/pool_proj
[TRT]    ConvReluFusion: Fusing inception_5a/pool_proj with inception_5a/relu_pool_proj
[TRT]    Running: ConvReluFusion on inception_5b/1x1
[TRT]    ConvReluFusion: Fusing inception_5b/1x1 with inception_5b/relu_1x1
[TRT]    Running: ConvReluFusion on inception_5b/3x3_reduce
[TRT]    ConvReluFusion: Fusing inception_5b/3x3_reduce with inception_5b/relu_3x3_reduce
[TRT]    Running: ConvReluFusion on inception_5b/3x3
[TRT]    ConvReluFusion: Fusing inception_5b/3x3 with inception_5b/relu_3x3
[TRT]    Running: ConvReluFusion on inception_5b/5x5_reduce
[TRT]    ConvReluFusion: Fusing inception_5b/5x5_reduce with inception_5b/relu_5x5_reduce
[TRT]    Running: ConvReluFusion on inception_5b/5x5
[TRT]    ConvReluFusion: Fusing inception_5b/5x5 with inception_5b/relu_5x5
[TRT]    Running: ConvReluFusion on inception_5b/pool_proj
[TRT]    ConvReluFusion: Fusing inception_5b/pool_proj with inception_5b/relu_pool_proj
[TRT]    After dupe layer removal: 84 layers
[TRT]    After final dead-layer removal: 84 layers
[TRT]    After tensor merging: 84 layers
[TRT]    After vertical fusions: 84 layers
[TRT]    After dupe layer removal: 84 layers
[TRT]    After final dead-layer removal: 84 layers
[TRT]    Merging layers: inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce
[TRT]    Merging layers: inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce
[TRT]    Merging layers: inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce
[TRT]    Merging layers: inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce
[TRT]    Merging layers: inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce
[TRT]    Merging layers: inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce || inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce
[TRT]    Merging layers: inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce
[TRT]    Merging layers: inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce
[TRT]    Merging layers: inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce
[TRT]    After tensor merging: 66 layers
[TRT]    After slice removal: 66 layers
[TRT]    Eliminating concatenation inception_5b/output
[TRT]    Generating copy for inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce to inception_5b/output because input is not movable.
[TRT]    Retargeting inception_5b/3x3 to inception_5b/output
[TRT]    Retargeting inception_5b/5x5 to inception_5b/output
[TRT]    Retargeting inception_5b/pool_proj to inception_5b/output
[TRT]    Eliminating concatenation inception_5a/output
[TRT]    Generating copy for inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce to inception_5a/output because input is not movable.
[TRT]    Retargeting inception_5a/3x3 to inception_5a/output
[TRT]    Retargeting inception_5a/5x5 to inception_5a/output
[TRT]    Retargeting inception_5a/pool_proj to inception_5a/output
[TRT]    Eliminating concatenation inception_4e/output
[TRT]    Generating copy for inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce to inception_4e/output because input is not movable.
[TRT]    Retargeting inception_4e/3x3 to inception_4e/output
[TRT]    Retargeting inception_4e/5x5 to inception_4e/output
[TRT]    Retargeting inception_4e/pool_proj to inception_4e/output
[TRT]    Eliminating concatenation inception_4d/output
[TRT]    Generating copy for inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce || inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce to inception_4d/output because input is not movable.
[TRT]    Retargeting inception_4d/3x3 to inception_4d/output
[TRT]    Retargeting inception_4d/5x5 to inception_4d/output
[TRT]    Retargeting inception_4d/pool_proj to inception_4d/output
[TRT]    Eliminating concatenation inception_4c/output
[TRT]    Generating copy for inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce to inception_4c/output because input is not movable.
[TRT]    Retargeting inception_4c/3x3 to inception_4c/output
[TRT]    Retargeting inception_4c/5x5 to inception_4c/output
[TRT]    Retargeting inception_4c/pool_proj to inception_4c/output
[TRT]    Eliminating concatenation inception_4b/output
[TRT]    Generating copy for inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce to inception_4b/output because input is not movable.
[TRT]    Retargeting inception_4b/3x3 to inception_4b/output
[TRT]    Retargeting inception_4b/5x5 to inception_4b/output
[TRT]    Retargeting inception_4b/pool_proj to inception_4b/output
[TRT]    Eliminating concatenation inception_4a/output
[TRT]    Generating copy for inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce to inception_4a/output because input is not movable.
[TRT]    Retargeting inception_4a/3x3 to inception_4a/output
[TRT]    Retargeting inception_4a/5x5 to inception_4a/output
[TRT]    Retargeting inception_4a/pool_proj to inception_4a/output
[TRT]    Eliminating concatenation inception_3b/output
[TRT]    Generating copy for inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce to inception_3b/output because input is not movable.
[TRT]    Retargeting inception_3b/3x3 to inception_3b/output
[TRT]    Retargeting inception_3b/5x5 to inception_3b/output
[TRT]    Retargeting inception_3b/pool_proj to inception_3b/output
[TRT]    Eliminating concatenation inception_3a/output
[TRT]    Generating copy for inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce to inception_3a/output because input is not movable.
[TRT]    Retargeting inception_3a/3x3 to inception_3a/output
[TRT]    Retargeting inception_3a/5x5 to inception_3a/output
[TRT]    Retargeting inception_3a/pool_proj to inception_3a/output
[TRT]    After concat removal: 66 layers
[TRT]    Trying to split Reshape and strided tensor
[TRT]    Graph construction and optimization completed in 0.0584452 seconds.
[TRT]    ---------- Layers Running on DLA ----------
[TRT]    ---------- Layers Running on GPU ----------
[TRT]    [GpuLayer] CONVOLUTION: conv1/7x7_s2 + conv1/relu_7x7
[TRT]    [GpuLayer] POOLING: pool1/3x3_s2
[TRT]    [GpuLayer] LRN: pool1/norm1
[TRT]    [GpuLayer] CONVOLUTION: conv2/3x3_reduce + conv2/relu_3x3_reduce
[TRT]    [GpuLayer] CONVOLUTION: conv2/3x3 + conv2/relu_3x3
[TRT]    [GpuLayer] LRN: conv2/norm2
[TRT]    [GpuLayer] POOLING: pool2/3x3_s2
[TRT]    [GpuLayer] CONVOLUTION: inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_3a/3x3 + inception_3a/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_3a/5x5 + inception_3a/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_3a/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_3a/pool_proj + inception_3a/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_3a/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_3b/3x3 + inception_3b/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_3b/5x5 + inception_3b/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_3b/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_3b/pool_proj + inception_3b/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_3b/1x1 copy
[TRT]    [GpuLayer] POOLING: pool3/3x3_s2
[TRT]    [GpuLayer] CONVOLUTION: inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_4a/3x3 + inception_4a/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_4a/5x5 + inception_4a/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_4a/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_4a/pool_proj + inception_4a/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_4a/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_4b/3x3 + inception_4b/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_4b/5x5 + inception_4b/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_4b/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_4b/pool_proj + inception_4b/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_4b/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_4c/3x3 + inception_4c/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_4c/5x5 + inception_4c/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_4c/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_4c/pool_proj + inception_4c/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_4c/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce || inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_4d/3x3 + inception_4d/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_4d/5x5 + inception_4d/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_4d/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_4d/pool_proj + inception_4d/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_4d/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_4e/3x3 + inception_4e/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_4e/5x5 + inception_4e/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_4e/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_4e/pool_proj + inception_4e/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_4e/1x1 copy
[TRT]    [GpuLayer] POOLING: pool4/3x3_s2
[TRT]    [GpuLayer] CONVOLUTION: inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_5a/3x3 + inception_5a/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_5a/5x5 + inception_5a/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_5a/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_5a/pool_proj + inception_5a/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_5a/1x1 copy
[TRT]    [GpuLayer] CONVOLUTION: inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce
[TRT]    [GpuLayer] CONVOLUTION: inception_5b/3x3 + inception_5b/relu_3x3
[TRT]    [GpuLayer] CONVOLUTION: inception_5b/5x5 + inception_5b/relu_5x5
[TRT]    [GpuLayer] POOLING: inception_5b/pool
[TRT]    [GpuLayer] CONVOLUTION: inception_5b/pool_proj + inception_5b/relu_pool_proj
[TRT]    [GpuLayer] COPY: inception_5b/1x1 copy
[TRT]    [GpuLayer] POOLING: pool5/7x7_s1
[TRT]    [GpuLayer] CONVOLUTION: loss3/classifier
[TRT]    [GpuLayer] SOFTMAX: prob
[TRT]    Trying to load shared library libcublas.so.11
[TRT]    Loaded shared library libcublas.so.11
[TRT]    Using cublas as plugin tactic source
[TRT]    Trying to load shared library libcublasLt.so.11
[TRT]    Loaded shared library libcublasLt.so.11
[TRT]    Using cublasLt as core library tactic source
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +625, now: CPU 1187, GPU 3952 (MiB)
[TRT]    Trying to load shared library libcudnn.so.8
[TRT]    Loaded shared library libcudnn.so.8
[TRT]    Using cuDNN as plugin tactic source
[TRT]    Using cuDNN as core library tactic source
[TRT]    [MemUsageChange] Init cuDNN: CPU +82, GPU +126, now: CPU 1269, GPU 4078 (MiB)
[TRT]    Global timing cache in use. Profiling results in this builder pass will be stored.
[TRT]    Constructing optimization profile number 0 [1/1].
[TRT]    Reserving memory for host IO tensors. Host: 0 bytes
[TRT]    =============== Computing reformatting costs: 
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Float(150528,1,672,3) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0388984
[TRT]    Tactic: 0x00000000000003ea Time: 0.0660305
[TRT]    Tactic: 0x0000000000000000 Time: 0.0387326
[TRT]    Fastest Tactic: 0x0000000000000000 Time: 0.0387326
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Float(50176,1:4,224,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0749105
[TRT]    Tactic: 0x00000000000003ea Time: 0.0624887
[TRT]    Tactic: 0x0000000000000000 Time: 0.0751331
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.0624887
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Half(150528,50176,224,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0132488
[TRT]    Tactic: 0x00000000000003ea Time: 0.0373537
[TRT]    Tactic: 0x0000000000000000 Time: 0.0486836
[TRT]    Fastest Tactic: 0x00000000000003e8 Time: 0.0132488
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Half(100352,50176:2,224,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0521847
[TRT]    Tactic: 0x00000000000003ea Time: 0.070464
[TRT]    Tactic: 0x0000000000000000 Time: 0.0304039
[TRT]    Fastest Tactic: 0x0000000000000000 Time: 0.0304039
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Half(50176,1:4,224,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0520742
[TRT]    Tactic: 0x00000000000003ea Time: 0.0339927
[TRT]    Tactic: 0x0000000000000000 Time: 0.0518298
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.0339927
[TRT]    *************** Autotuning Reformat: Float(150528,50176,224,1) -> Half(50176,1:8,224,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(data -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0887927
[TRT]    Tactic: 0x00000000000003ea Time: 0.0614298
[TRT]    Tactic: 0x0000000000000000 Time: 0.0405828
[TRT]    Fastest Tactic: 0x0000000000000000 Time: 0.0405828
[TRT]    =============== Computing reformatting costs: 
[TRT]    *************** Autotuning Reformat: Float(802816,12544,112,1) -> Float(200704,1:4,1792,16) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.719087
[TRT]    Tactic: 0x00000000000003ea Time: 0.133603
[TRT]    Tactic: 0x0000000000000000 Time: 0.592122
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.133603
[TRT]    *************** Autotuning Reformat: Float(802816,12544,112,1) -> Half(802816,12544,112,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.0912291
[TRT]    Tactic: 0x00000000000003ea Time: 0.164381
[TRT]    Tactic: 0x0000000000000000 Time: 0.259113
[TRT]    Fastest Tactic: 0x00000000000003e8 Time: 0.0912291
[TRT]    *************** Autotuning Reformat: Float(802816,12544,112,1) -> Half(401408,12544:2,112,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.317257
[TRT]    Tactic: 0x00000000000003ea Time: 0.192198
[TRT]    Tactic: 0x0000000000000000 Time: 0.139887
[TRT]    Fastest Tactic: 0x0000000000000000 Time: 0.139887
[TRT]    *************** Autotuning Reformat: Float(802816,12544,112,1) -> Half(100352,1:8,896,8) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.438359
[TRT]    Tactic: 0x00000000000003ea Time: 0.122243
[TRT]    Tactic: 0x0000000000000000 Time: 0.14494
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.122243
[TRT]    *************** Autotuning Reformat: Float(802816,1,7168,64) -> Float(802816,12544,112,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.504669
[TRT]    Tactic: 0x00000000000003ea Time: 0.153452
[TRT]    Tactic: 0x0000000000000000 Time: 0.380396
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.153452
[TRT]    *************** Autotuning Reformat: Float(802816,1,7168,64) -> Float(200704,1:4,1792,16) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.212239
[TRT]    Tactic: 0x00000000000003ea Time: 0.135258
[TRT]    Tactic: 0x0000000000000000 Time: 0.375668
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.135258
[TRT]    *************** Autotuning Reformat: Float(802816,1,7168,64) -> Half(802816,12544,112,1) ***************
[TRT]    --------------- Timing Runner: Optimizer Reformat(conv1/7x7_s2 -> <out>) (Reformat)
[TRT]    Tactic: 0x00000000000003e8 Time: 0.519762
[TRT]    Tactic: 0x00000000000003ea Time: 0.140538
[TRT]    Tactic: 0x0000000000000000 Time: 0.37769
[TRT]    Fastest Tactic: 0x00000000000003ea Time: 0.140538
...
youtalk commented 1 year ago

Never ending is my lie. After 1 minute it has started. I'm sory.