Open angry-crab opened 8 months ago
Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode
Hi @angry-crab, TensorRT can only build the loadable, but is unable to load it. We should use cuDLA API to load and execute it, cuDLA samples can be found in https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAHybridMode and https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/cuDLAStandaloneMode
Hi, thank you for the info. However, I would like to profile cuDLA internal computations, such matmul, conv, etc. Is there a way to do that?
sorry for the late reply. @angry-crab here are the samples that used to provide layerwise statistics to the application. https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/tree/main/samples/cuDLA Please check if cudlaExternalEtbl.hpp is available on your platform. Layer-wise profiling is a new feature that may not be supported on some older platforms.
@lynettez https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/issues/27
how to view DLA utilization rate ?
Hi, I tried to profile DLA according to this tutorial. https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial
But I got
Error[1]: [runtime.cpp::parsePlan::314] Error Code 1: Serialization (Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed.)
It seems that TensorRT cannot serialized the loadable somehow. Some posts said this was because of mismatch of TensorRT versions, but I was using the same TensorRT for building and inferring.
Therefore, I was wondering if there is a way to profile cuDLA. Thanks.