Open Liaoqing-up opened 1 year ago
I used prenorm
inside each layer. https://github.com/TuSimple/centerformer/blob/96aa37503dc900d1aebeb7c1086c33bbd0c01d26/det3d/models/utils/transformer.py#L218-L238
I used
prenorm
inside each layer.
I see, but I wonder if you have tried Add&Norm after each layer, which means the residual skip connect input are the features already passed through the Norm. Is it possible that the results of these two structures do not differ much?
Sorry, I haven't tried Add&Norm after each layer. Do you have experience with this before and would the results be better if you used this implementation?
https://github.com/TuSimple/centerformer/blob/96aa37503dc900d1aebeb7c1086c33bbd0c01d26/det3d/models/utils/transformer.py#L267-L279 In the code, the residual in transformer is only the input after add and does not pass through the norm layer. add and norm are not taken as a whole, which is different from the typical transformer structure (the result of add and norm in series as a new level of input). Is there any special consideration for the design here?