easton-cau / SOTR

SOTR: Segmenting Objects with Transformers
MIT License
193 stars 32 forks source link

Is there any analysis on why APs so low? #11

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

Noticed that SOTR APs is low 10.7 vs MaskRCNN 20, does there any reason for this? Just wonder if APs can boosted the AP overall could even higher.

easton-cau commented 2 years ago

There are several potential reasons. First, the feature maps that we feed to the transformer are divided into a uniform grid of N*N, and a grid cell only predicts an object instance, which may miss some small objects. Also, the transformer can better build long-range dependencies and capture global features, thus leading to excellent performance on larger objects, but it neglects small objects and local information to a certain extent. Furthermore, the relative low-resolution feature maps P5 with positional information are obtained from the transformer module and combined with P2-P4 in FPN to generate final masks, making it harder for the model to segment small objects precisely. Finally, APs will be poor without using bbox information.

We expect that future work will improve this aspect. I believe I have answered your question, and as such I'm closing the issue, but let us know if you have further questions.