Closed Jian-danai closed 1 year ago
Softmax has been merged in OP MultiScaleDeformableAttnTRT
of CustomMSDeformableAttentionTRTP
for better quantization.
Thanks. I tested the latest code it looks TRTP pytorch inference gives normal outputs, I will close this issue.
Hi,
It looks
CustomMSDeformableAttentionTRTP
gives incorrect outputs in Pytorch inference (different fromCustomMSDeformableAttentionTRT
andCustomMSDeformableAttention
), although the Pytorch inference result is not that important.The reasons caused the difference are: shapes:
reference_points
andattention_weights
are reshaped in TRTP, specifically, the shapes of tensors fed into the forward funtion are different softmax: There is no softmax forattention_weights
inCustomMSDeformableAttentionTRTP
Are there any reasons to give different outputs in TRTP? Or it is just a bug for Pytorch inference but does not matter? Thanks.