How to profile cuDLA computation

NVIDIA-AI-IOT / cuDLA-samples

YOLOv5 on Orin DLA

Other

185 stars 18 forks source link

How to profile cuDLA computation #31

Open angry-crab opened 8 months ago

angry-crab commented 8 months ago

Hi, I tried to profile DLA according to this tutorial. https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial

But I got Error[1]: [runtime.cpp::parsePlan::314] Error Code 1: Serialization (Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed.)

It seems that TensorRT cannot serialized the loadable somehow. Some posts said this was because of mismatch of TensorRT versions, but I was using the same TensorRT for building and inferring.

Therefore, I was wondering if there is a way to profile cuDLA. Thanks.

lynettez commented 8 months ago

Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode

angry-crab commented 8 months ago

Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode

Hi, thank you for the info. However, I would like to profile cuDLA internal computations, such matmul, conv, etc. Is there a way to do that?

lynettez commented 2 months ago

sorry for the late reply. @angry-crab here are the samples that used to provide layerwise statistics to the application. https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/tree/main/samples/cuDLA Please check if cudlaExternalEtbl.hpp is available on your platform. Layer-wise profiling is a new feature that may not be supported on some older platforms.

lix19937 commented 1 month ago

@lynettez https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/issues/27
how to view DLA utilization rate ?