dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.8k stars 2.98k forks source link

Deploying pytorchvideo models (3D convnets for video analysis) on Jetson (NX) #1199

Closed HansHorak closed 3 years ago

HansHorak commented 3 years ago

Dear Dusty, thank you for all your hard work pulling the Jetson ecosystem. I wanted to ask whether there is any plan to develop any helpful tools for deploying pytorchvideo (a FAIR project https://github.com/facebookresearch/pytorchvideo) models on Jetson hardware? From my googling of the issue it seems to me that what makes it difficult to do is translating 3D convolutions (e.g. depthwise separable convolutions and other 3D CNN layers/operations?) from Pytorch to ONNX to TensorRT. Do you have any advice to give if one wanted to train 3D convnets in the pytorch ecosystem and deploy on the Jetson Xavier NX? Best, Hans

dusty-nv commented 3 years ago

Hi @HansHorak, this seems interesting, and I would be interested to add support for action/behavior recognition if these models were to work in TensorRT.

Can you try exporting the model from PyTorch to ONNX, and seeing if it loads in TensorRT? You can do a quick check of that with the trtexec tool found under /usr/src/tensorrt/bin on your Jetson.

HansHorak commented 3 years ago

Hi, again. We made a custom untrained x3d model using pytorchvideo tutorial (just raw initiated model so not sure if it works or makes good sense) and after replacing swish activations with ReLU, managed to get it to ONNX format. Then we tried trtexec on Jetson Xavir NX and indeed it seems to do something. Those are supposed to be 3D convolutions. Those "broadcasting tensor to [NONE]" are failing ? What do you make of this output?

Thanks

...
[09/15/2021-16:06:10] [I] TensorRT version: 8001
[09/15/2021-16:06:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 5635 (MiB)
[09/15/2021-16:06:11] [I] Start parsing network model
[09/15/2021-16:06:11] [I] [TRT] ----------------------------------------------------------------
[09/15/2021-16:06:11] [I] [TRT] Input filename:   X3D_M_custom_untrained.onnx
[09/15/2021-16:06:11] [I] [TRT] ONNX IR version:  0.0.6
[09/15/2021-16:06:11] [I] [TRT] Opset version:    9
[09/15/2021-16:06:11] [I] [TRT] Producer name:    pytorch
[09/15/2021-16:06:11] [I] [TRT] Producer version: 1.9
[09/15/2021-16:06:11] [I] [TRT] Domain:           
[09/15/2021-16:06:11] [I] [TRT] Model version:    0
[09/15/2021-16:06:11] [I] [TRT] Doc string:       
[09/15/2021-16:06:11] [I] [TRT] ----------------------------------------------------------------
[09/15/2021-16:06:12] [09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].
[09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].
[09/15/2021-16:06:12] [I] Finish parsing network model
[09/15/2021-16:06:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 384, GPU 5671 (MiB)
[09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].
[09/15/2021-16:06:12] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 384 MiB, GPU 5671 MiB
[09/15/2021-16:06:12] [I] [TRT] ---------- Layers Running on DLA ----------
[09/15/2021-16:06:12] [I] [TRT] ---------- Layers Running on GPU ----------
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] 1197 + (Unnamed Layer* 287) [Shuffle]
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] blocks.5.proj.bias + (Unnamed Layer* 290) [Shuffle]
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_0
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_1 + Relu_2
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_4 + Relu_5
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_6
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] ReduceMean_7
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_8 + Relu_9
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_10
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_11), Mul_12 + Relu_13)
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_14
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_3 + Add_15 + Relu_16
...
more layers etc
...
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_279 + Relu_280
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] (Unnamed Layer* 281) [Identity]
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] AveragePool_282
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_283 + Relu_284
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Transpose_285
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] MatMul_286
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Add_287
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Transpose_288 + (Unnamed Layer* 293) [Shuffle]
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Softmax_289
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] (Unnamed Layer* 295) [Shuffle] + Transpose_290
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] GlobalAveragePool_291
[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Reshape_297
[09/15/2021-16:06:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +206, now: CPU 611, GPU 5877 (MiB)
[09/15/2021-16:06:14] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +309, now: CPU 918, GPU 6186 (MiB)
[09/15/2021-16:06:14] [09/15/2021-16:06:38] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/15/2021-16:08:59] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/15/2021-16:08:59] [I] [TRT] Total Host Persistent Memory: 9680
[09/15/2021-16:08:59] [I] [TRT] Total Device Persistent Memory: 0
[09/15/2021-16:08:59] [I] [TRT] Total Scratch Memory: 8366080
[09/15/2021-16:08:59] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 101 MiB
[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)
[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 1377, GPU 6688 (MiB)
[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)
[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)
[09/15/2021-16:08:59] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1375 MiB, GPU 6688 MiB
[09/15/2021-16:08:59] [I] [TRT] Loaded engine size: 12 MB
[09/15/2021-16:08:59] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1395 MiB, GPU 6709 MiB
[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)
[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)
[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)
[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1397 MiB, GPU 6709 MiB
[09/15/2021-16:09:00] [I] Engine built in 169.626 sec.
[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1383 MiB, GPU 6709 MiB
[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1383, GPU 6709 (MiB)
[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1383, GPU 6709 (MiB)
[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1383 MiB, GPU 6710 MiB
[09/15/2021-16:09:00] [I] Created input binding for input.1 with dimensions 1x3x20x256x256
[09/15/2021-16:09:00] [I] Created output binding for 947 with dimensions 1x2
[09/15/2021-16:09:00] [I] Starting inference
[09/15/2021-16:09:03] [I] Warmup completed 1 queries over 200 ms
[09/15/2021-16:09:03] [I] Timing trace has 15 queries over 3.47132 s
[09/15/2021-16:09:03] [I] 
[09/15/2021-16:09:03] [I] === Trace details ===
[09/15/2021-16:09:03] [I] Trace averages of 10 runs:
[09/15/2021-16:09:03] [I] Average on 10 runs - GPU latency: 230.255 ms - Host latency: 231.355 ms (end to end 231.365 ms, enqueue 187.4 ms)
[09/15/2021-16:09:03] [I] 
[09/15/2021-16:09:03] [I] === Performance summary ===
[09/15/2021-16:09:03] [I] Throughput: 4.32113 qps
[09/15/2021-16:09:03] [I] Latency: min = 231.232 ms, max = 232.2 ms, mean = 231.41 ms, median = 231.383 ms, percentile(99%) = 232.2 ms
[09/15/2021-16:09:03] [I] End-to-End Host Latency: min = 231.237 ms, max = 232.211 ms, mean = 231.42 ms, median = 231.395 ms, percentile(99%) = 232.211 ms
[09/15/2021-16:09:03] [I] Enqueue Time: min = 175.735 ms, max = 190.365 ms, mean = 188.348 ms, median = 190.263 ms, percentile(99%) = 190.365 ms
[09/15/2021-16:09:03] [I] H2D Latency: min = 1.09521 ms, max = 1.26855 ms, mean = 1.10751 ms, median = 1.0957 ms, percentile(99%) = 1.26855 ms
[09/15/2021-16:09:03] [I] GPU Compute Time: min = 230.132 ms, max = 230.927 ms, mean = 230.299 ms, median = 230.283 ms, percentile(99%) = 230.927 ms
[09/15/2021-16:09:03] [I] D2H Latency: min = 0.00341797 ms, max = 0.00415039 ms, mean = 0.00367839 ms, median = 0.00366211 ms, percentile(99%) = 0.00415039 ms
[09/15/2021-16:09:03] [I] Total Host Walltime: 3.47132 s
[09/15/2021-16:09:03] [I] Total GPU Compute Time: 3.45448 s
...
dusty-nv commented 3 years ago

It seems that trtexec completely successfully, because it built the model engine and reported the benchmarking times.

From: HansHorak @.> Sent: Wednesday, September 15, 2021 12:39 PM To: dusty-nv/jetson-inference @.> Cc: Dustin Franklin @.>; Comment @.> Subject: Re: [dusty-nv/jetson-inference] Deploying pytorchvideo models (3D convnets for video analysis) on Jetson (NX) (#1199)

Hi, again. We made a custom untrained x3d model using pytorchvideo tutorial (just raw initiated model so not sure if it works or makes good sense) and after replacing swish activations with ReLU, managed to get it to ONNX format. Then we tried trtexec on Jetson Xavir NX and indeed it seems to do something. Those are supposed to be 3D convolutions. Those "broadcasting tensor to [NONE]" are failing ? What do you make of this output?

Thanks

...

[09/15/2021-16:06:10] [I] TensorRT version: 8001

[09/15/2021-16:06:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 5635 (MiB)

[09/15/2021-16:06:11] [I] Start parsing network model

[09/15/2021-16:06:11] [I] [TRT] ----------------------------------------------------------------

[09/15/2021-16:06:11] [I] [TRT] Input filename: X3D_M_custom_untrained.onnx

[09/15/2021-16:06:11] [I] [TRT] ONNX IR version: 0.0.6

[09/15/2021-16:06:11] [I] [TRT] Opset version: 9

[09/15/2021-16:06:11] [I] [TRT] Producer name: pytorch

[09/15/2021-16:06:11] [I] [TRT] Producer version: 1.9

[09/15/2021-16:06:11] [I] [TRT] Domain:

[09/15/2021-16:06:11] [I] [TRT] Model version: 0

[09/15/2021-16:06:11] [I] [TRT] Doc string:

[09/15/2021-16:06:11] [I] [TRT] ----------------------------------------------------------------

[09/15/2021-16:06:12] [09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].

[09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].

[09/15/2021-16:06:12] [I] Finish parsing network model

[09/15/2021-16:06:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 384, GPU 5671 (MiB)

[09/15/2021-16:06:12] [I] [TRT] MatMul_286: broadcasting input1 to make tensors conform, dims(input0)=[1,1,4,4,2048][NONE] dims(input1)=[1,1,1,2048,2][NONE].

[09/15/2021-16:06:12] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 384 MiB, GPU 5671 MiB

[09/15/2021-16:06:12] [I] [TRT] ---------- Layers Running on DLA ----------

[09/15/2021-16:06:12] [I] [TRT] ---------- Layers Running on GPU ----------

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] 1197 + (Unnamed Layer* 287) [Shuffle]

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] blocks.5.proj.bias + (Unnamed Layer* 290) [Shuffle]

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_0

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_1 + Relu_2

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_4 + Relu_5

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_6

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] ReduceMean_7

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_8 + Relu_9

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_10

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_11), Mul_12 + Relu_13)

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_14

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_3 + Add_15 + Relu_16

...

more layers etc

...

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_279 + Relu_280

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] (Unnamed Layer* 281) [Identity]

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] AveragePool_282

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Conv_283 + Relu_284

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Transpose_285

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] MatMul_286

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Add_287

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Transpose_288 + (Unnamed Layer* 293) [Shuffle]

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Softmax_289

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] (Unnamed Layer* 295) [Shuffle] + Transpose_290

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] GlobalAveragePool_291

[09/15/2021-16:06:12] [I] [TRT] [GpuLayer] Reshape_297

[09/15/2021-16:06:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +206, now: CPU 611, GPU 5877 (MiB)

[09/15/2021-16:06:14] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +309, now: CPU 918, GPU 6186 (MiB)

[09/15/2021-16:06:14] [09/15/2021-16:06:38] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

[09/15/2021-16:08:59] [I] [TRT] Detected 1 inputs and 1 output network tensors.

[09/15/2021-16:08:59] [I] [TRT] Total Host Persistent Memory: 9680

[09/15/2021-16:08:59] [I] [TRT] Total Device Persistent Memory: 0

[09/15/2021-16:08:59] [I] [TRT] Total Scratch Memory: 8366080

[09/15/2021-16:08:59] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 101 MiB

[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)

[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 1377, GPU 6688 (MiB)

[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)

[09/15/2021-16:08:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1376, GPU 6688 (MiB)

[09/15/2021-16:08:59] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1375 MiB, GPU 6688 MiB

[09/15/2021-16:08:59] [I] [TRT] Loaded engine size: 12 MB

[09/15/2021-16:08:59] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1395 MiB, GPU 6709 MiB

[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)

[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)

[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1397, GPU 6709 (MiB)

[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1397 MiB, GPU 6709 MiB

[09/15/2021-16:09:00] [I] Engine built in 169.626 sec.

[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1383 MiB, GPU 6709 MiB

[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1383, GPU 6709 (MiB)

[09/15/2021-16:09:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1383, GPU 6709 (MiB)

[09/15/2021-16:09:00] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1383 MiB, GPU 6710 MiB

[09/15/2021-16:09:00] [I] Created input binding for input.1 with dimensions 1x3x20x256x256

[09/15/2021-16:09:00] [I] Created output binding for 947 with dimensions 1x2

[09/15/2021-16:09:00] [I] Starting inference

[09/15/2021-16:09:03] [I] Warmup completed 1 queries over 200 ms

[09/15/2021-16:09:03] [I] Timing trace has 15 queries over 3.47132 s

[09/15/2021-16:09:03] [I]

[09/15/2021-16:09:03] [I] === Trace details ===

[09/15/2021-16:09:03] [I] Trace averages of 10 runs:

[09/15/2021-16:09:03] [I] Average on 10 runs - GPU latency: 230.255 ms - Host latency: 231.355 ms (end to end 231.365 ms, enqueue 187.4 ms)

[09/15/2021-16:09:03] [I]

[09/15/2021-16:09:03] [I] === Performance summary ===

[09/15/2021-16:09:03] [I] Throughput: 4.32113 qps

[09/15/2021-16:09:03] [I] Latency: min = 231.232 ms, max = 232.2 ms, mean = 231.41 ms, median = 231.383 ms, percentile(99%) = 232.2 ms

[09/15/2021-16:09:03] [I] End-to-End Host Latency: min = 231.237 ms, max = 232.211 ms, mean = 231.42 ms, median = 231.395 ms, percentile(99%) = 232.211 ms

[09/15/2021-16:09:03] [I] Enqueue Time: min = 175.735 ms, max = 190.365 ms, mean = 188.348 ms, median = 190.263 ms, percentile(99%) = 190.365 ms

[09/15/2021-16:09:03] [I] H2D Latency: min = 1.09521 ms, max = 1.26855 ms, mean = 1.10751 ms, median = 1.0957 ms, percentile(99%) = 1.26855 ms

[09/15/2021-16:09:03] [I] GPU Compute Time: min = 230.132 ms, max = 230.927 ms, mean = 230.299 ms, median = 230.283 ms, percentile(99%) = 230.927 ms

[09/15/2021-16:09:03] [I] D2H Latency: min = 0.00341797 ms, max = 0.00415039 ms, mean = 0.00367839 ms, median = 0.00366211 ms, percentile(99%) = 0.00415039 ms

[09/15/2021-16:09:03] [I] Total Host Walltime: 3.47132 s

[09/15/2021-16:09:03] [I] Total GPU Compute Time: 3.45448 s

...

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/1199#issuecomment-920183285, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK3XCVY4XQGL2KLU3UDUCDD2ZANCNFSM5CUT4SDA. Triage notifications on the go with GitHub Mobile for iOShttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cdustinf%40nvidia.com%7C96a0b9beabf3456b365208d97867586e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637673207529788705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yhFiP%2BBdqvqpG7a9UCM6qfO2LsFzOF6%2F0cDn0aYxGes%3D&reserved=0 or Androidhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cdustinf%40nvidia.com%7C96a0b9beabf3456b365208d97867586e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637673207529798698%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IDGS%2BsAXe76Tby%2BOpyeBBZMZ6IFjCOxsf%2Ba%2B14xE7vw%3D&reserved=0.

HansHorak commented 3 years ago

Wonderful, looks like pytorch 1.9 and jetpack 4.6 have decent support for 3D CNNs. Not sure about the availability of various optimization methods in the accelerator yet, but the main operations and conversions seem to work.