Open Parsifal133 opened 3 months ago
It seems a multi-task/head model. A slightly more complex approach is to split the model into backbone + num_heads, runtime dyanmic select the head module.
It seems a multi-task/head model. A slightly more complex approach is to split the model into backbone + num_heads, runtime dyanmic select the head module.
Yes, this is a continual learning multi-task model. In fact, each convolutional layer of the model have multiple branches, not just the head. Here only the head code with nested if-conditional layer is shown.
I will try not to introduce multiple branches in each convolutional layer,but it will reduce the effectiveness of continual learning.
Can you draw the data-flow of model ?
Hello, @lix19937 ! I have illustrated a general convolutional layer and my multi-task convolutional layer, as shown in the figure. It is evident that the conventional convolutional layer on the left consists of a convolution, batch normalization (BN) layer, and activation layer, whereas my convolutional layer incorporates N branches. However, during inference, only one branch is executed at a time. Specifically, when the task_id==1, the first branch is executed, and its output is added to the main branch’s output before being passed into the activation function.
I implemented this logic using the network.add_if_conditional()
in the Python API, and the resulting engine produces correct inference results. The only issue is that the engine’s inference is relatively slow and occupies a considerable amount of space. Additionally, as the number of branches increases, the inference time further increases. The specific experimental results are presented in the table below.
I suspect that the introduction of numerous if_conditional is causing this issue. Therefore, I humbly seek advice from the community and from you on whether there are better solutions to this problem.
Hi @Parsifal133 could you try running your model on latest version of TRT? We expect improved performance on latest version.
Additionally, as the number of branches increases, the inference time further increases.
YES, it means that there are multiple branches that need to be computed and they are serial .
The only issue is that the engine’s inference is relatively slow and occupies a considerable amount of space.
What precision is used ? (fp32/fp16/int8)
Hello everyone!
I am using TensorRT 8.2 and the Python API to build a YOLOv5 model with multiple branches.
Specifically, each convolutional layer has multiple branches (but only one branch is executed during each inference), so I am using nested network.add_if_conditional().
Fortunately, I achieved the functionality I wanted, but the exported engine file is quite large (which is not the most important issue). However, the actual inference time increases as the number of branches increases.
This is the code for using nested if_conditional() for the YOLOv5 output heads. Since the number of branches is often more than two, nested if_conditional() is needed.
I would like to know if there is a better way to avoid the increase in inference time.
Any possible suggestions would be greatly appreciated!