ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 209 forks source link

Using Xavier DLAs? #245

Closed bjajoh closed 2 years ago

bjajoh commented 3 years ago

Hi,

how can I utilise the deep learning accelerators in the Jetson Xavier (NX)?

Thanks for your help, Bjarne

bjajoh commented 3 years ago

Any info on that topic @mive93 @ceccocats ?

perseusdg commented 3 years ago

I believe tkDNN supports using dla as long as your TensorRT version is greater than 5 and less than 8,have a look at this line for more info

bjajoh commented 3 years ago

@perseusdg but how do I serialize my network to use the DLAs? I can't fine anythig..

perseusdg commented 3 years ago

As in you want your network to run completely on the dla with no fallbacks to the gpu?

bjajoh commented 3 years ago

I mean the YOLO layer need to run on the GPU. But I would like know if there is a way to put all supported layer on the DLA.

perseusdg commented 3 years ago

if go you through the parts of tkDNN code i linked above,you will notice that the default device type is set to dla with gpu fallback enabled and according to nvidia's documentation this should be sufficient to run all supported layers on the dla and unsupported ones on the gpu that ,I copied the statement from the documentation below

setDefaultDeviceType(DeviceType deviceType)
This function sets the default deviceType to be used by the builder. It ensures that all the layers that can run on DLA runs on DLA unless setDeviceType is used to override the deviceType for a layer.

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_topic

bjajoh commented 3 years ago

Ok interesting, but can I change between DLA and only GPU when serializing? I mean is there a way to explicitly choose between both?

perseusdg commented 3 years ago

Changing your default device type to kGPU should make the network use the gpu only

bjajoh commented 3 years ago

Can you link the code you're referring to? :)

perseusdg commented 3 years ago

Here you go https://github.com/ceccocats/tkDNN/blob/a992c9feb5fb5c7a64da59f6c5a6f0c1c1a6cf2d/src/NetworkRT.cpp#L76

bjajoh commented 3 years ago

Awesome, thank you very much!

shridharkini commented 3 years ago

@bjajoh did the trick worked? It didnt work for me, Tried this too. https://github.com/ceccocats/tkDNN/issues/79

Got error like while running the ./demo code NVMEDIA_DLA : 885, ERROR: runtime registerEvent failed. err: 0x4. NVMEDIA_DLA : 1849, ERROR: RequestSubmitEvents failed. status: 0x7.

But when run using ./test_rtinference yolo4_fp16.rt 4 Seeing status active in /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status

bjajoh commented 3 years ago

I was able to serialize it with the DLA and FP16, however it has exact the same runtime and GPU utilization. Therefore I'm in doubt if it's actually using the DLA at all.

shridharkini commented 3 years ago

It worked for me now. while running in DLA for ./demo, batchsize should be same as TKDNN_BATCHSIZE that was set during trt file creation using ./test_yolo4. Yes me to got similar performance and tegrastat was showing activity in GR3D_FREQ with cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status showes as active..so it should be using DLA and GPU too as fallback hw...

bjajoh commented 3 years ago

But that's really weird I would expect the power consumption to drop. Can you check how the gpu clock speed influences the fps in dla mode? Switching to 10w desktop mode for example?

shridharkini commented 3 years ago

If we choose DLA, by default it goes to DLA0, how to change this to DLA=1

bjajoh commented 3 years ago

good question, maybe the Nvidia documentation says something about it?