NVIDIA / Deep-Learning-Accelerator-SW

NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.
Other
181 stars 15 forks source link

Need more DLA samples #1

Closed ram-cherukuri closed 1 year ago

ram-cherukuri commented 1 year ago

Please use this as a forum to tell us what types of samples would be most useful to you for leveraging DLA effectively in your application development. We will try our best to address the requests.

DzAvril commented 1 year ago

This is great! I am learning to program dla with cudla in standalone mode by cuda samples cuDLAStandaloneMode. The loadable bin is created by tensorrt with cmd: trtexec --deploy=/usr/src/tensorrt/data/resnet50/ResNet50_N2.prototxt --model=/usr/src/tensorrt/data/resnet50/ResNet50_fp32.caffemodel --output=fc1000 --useDLACore=0 --int8 --memPoolSize=dlaSRAM:1 --inputIOFormats=int8:chw --outputIOFormats=int8:chw --saveEngine=./resnet_50_int8_chw.bin --buildOnly --safe The dtype of inputs and outputs of loadable bin are int8 and the original mode's are fp32. In the sample mentioned above, there is no code about how to pre-process the fp32 input to int8 and post-process the int8 output to float32. So, can you post a sample to demonstrate how to process fp32 input fed to dla int8 model?