why AP-s is so poor！ - Githubissues

Thanks for your attention.

There are several potential reasons. First, the feature maps that we feed to the transformer are divided into a uniform grid of N*N, and a grid cell only predicts an object instance, which may miss some small objects. Also, the transformer can better build long-range dependencies and capture global features, thus leading to excellent performance on larger objects, but it neglects small objects and local information to a certain extent. Furthermore, the relative low-resolution feature maps P5 with positional information are obtained from the transformer module and combined with P2-P4 in FPN to generate final masks, making it harder for the model to segment small objects precisely. We expect that future work will improve this aspect.

I believe I have answered your question, and as such I'm closing the issue, but let us know if you have further questions.

easton-cau / SOTR

why AP-s is so poor！ #1