Closed bjajoh closed 2 years ago
Any info on that topic @mive93 @ceccocats ?
I believe tkDNN supports using dla as long as your TensorRT version is greater than 5 and less than 8,have a look at this line for more info
@perseusdg but how do I serialize my network to use the DLAs? I can't fine anythig..
As in you want your network to run completely on the dla with no fallbacks to the gpu?
I mean the YOLO layer need to run on the GPU. But I would like know if there is a way to put all supported layer on the DLA.
if go you through the parts of tkDNN code i linked above,you will notice that the default device type is set to dla with gpu fallback enabled and according to nvidia's documentation this should be sufficient to run all supported layers on the dla and unsupported ones on the gpu that ,I copied the statement from the documentation below
setDefaultDeviceType(DeviceType deviceType)
This function sets the default deviceType to be used by the builder. It ensures that all the layers that can run on DLA runs on DLA unless setDeviceType is used to override the deviceType for a layer.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_topic
Ok interesting, but can I change between DLA and only GPU when serializing? I mean is there a way to explicitly choose between both?
Changing your default device type to kGPU should make the network use the gpu only
Can you link the code you're referring to? :)
Awesome, thank you very much!
@bjajoh did the trick worked? It didnt work for me, Tried this too. https://github.com/ceccocats/tkDNN/issues/79
Got error like while running the ./demo code NVMEDIA_DLA : 885, ERROR: runtime registerEvent failed. err: 0x4. NVMEDIA_DLA : 1849, ERROR: RequestSubmitEvents failed. status: 0x7.
But when run using ./test_rtinference yolo4_fp16.rt 4 Seeing status active in /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status
I was able to serialize it with the DLA and FP16, however it has exact the same runtime and GPU utilization. Therefore I'm in doubt if it's actually using the DLA at all.
It worked for me now. while running in DLA for ./demo, batchsize should be same as TKDNN_BATCHSIZE that was set during trt file creation using ./test_yolo4. Yes me to got similar performance and tegrastat was showing activity in GR3D_FREQ with cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status showes as active..so it should be using DLA and GPU too as fallback hw...
But that's really weird I would expect the power consumption to drop. Can you check how the gpu clock speed influences the fps in dla mode? Switching to 10w desktop mode for example?
If we choose DLA, by default it goes to DLA0, how to change this to DLA=1
good question, maybe the Nvidia documentation says something about it?
Hi,
how can I utilise the deep learning accelerators in the Jetson Xavier (NX)?
Thanks for your help, Bjarne