Dear author, I have read your paper and the sparse Transformer you mentioned. I noticed that your proposed TopkTransformer is essentially the same as the sparse Transformer. However, in the related work section of your paper, there is no citation or discussion of this. Could you clarify the differences between the two?
Dear author, I have read your paper and the sparse Transformer you mentioned. I noticed that your proposed TopkTransformer is essentially the same as the sparse Transformer. However, in the related work section of your paper, there is no citation or discussion of this. Could you clarify the differences between the two?