Depth To Space Support and Transformer Softmax Speedup

Hi, I am currently deploying a multitask vision model on gap9. However, I have two main issues:

My model uses an operation called DepthToSpace which is not yet supported by the sdk which I believe would be great to have since it allows for fast and efficient upsampling.
I am deploying an efficient attention module. It uses Softmax on a rather large tensor which causes massive slow downs (46% of the total operations of my network are used calculate the softmax). This can sped up significantly by simply using a LUT for the exponential calculations according to this paper. Given the high interest in transformers lately I think this is a problem worth solving :).

My model is written in pytorch and exported to onnx opset version 16

Here are examples of the upscaling and attention layers. onnx_files.zip (Unfortunately the onnx export of the attention head is quite messy, it is based on Segformer, adapted so it would run on gap9)

Hope this issue sparks some interest, and I am happy to provide more information if needed.

GreenWaves-Technologies / gap_sdk

Depth To Space Support and Transformer Softmax Speedup #415