NVIDIA-AI-IOT / cuDLA-samples

YOLOv5 on Orin DLA
Other
167 stars 17 forks source link

How to profile cuDLA computation #31

Open angry-crab opened 4 months ago

angry-crab commented 4 months ago

Hi, I tried to profile DLA according to this tutorial. https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial

But I got Error[1]: [runtime.cpp::parsePlan::314] Error Code 1: Serialization (Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed.)

It seems that TensorRT cannot serialized the loadable somehow. Some posts said this was because of mismatch of TensorRT versions, but I was using the same TensorRT for building and inferring.

Therefore, I was wondering if there is a way to profile cuDLA. Thanks.

lynettez commented 4 months ago

Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode

angry-crab commented 3 months ago

Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode

Hi, thank you for the info. However, I would like to profile cuDLA internal computations, such matmul, conv, etc. Is there a way to do that?