In the paper, the σ(x) is implemented as a linear layer followed by a non-linear activation function (e.g., sigmoid function) for all input x. How to understand the input x?the matrix of raw iamge, or the extracted features, even or cls_score?
Thank you!
In the paper, the σ(x) is implemented as a linear layer followed by a non-linear activation function (e.g., sigmoid function) for all input x. How to understand the input x?the matrix of raw iamge, or the extracted features, even or cls_score? Thank you!