Closed SunzeY closed 11 months ago
Hi, @SunzeY
eval_cls
for eval_seg
shows higher COCO mask AP, and similar LVIS mask AP. This means that TAP has achieved saturated classification performance for instance segmentation.ResizeLongestEdge
+ CenterPaste
instead of ResizeShortestEdge
+ CenterCrop
.
Cropping will remove some foreground context and lead to performance degeneration.Thanks for replying :), this is really a wonderful work for research community!
Awesome work, Congratulations! I have some questions about the experiment setting.
Zero-shot instance segmentation
, you still use ViTDet classification result, However, TAP model can generate semantic token and do classification, have you tried regard ViTDet as a pure object proposal network and use TAP classification result for this task?Zero-shot instance classification
, cropping->clip create a strong baseline. I have tried this before but cannot achieve AP of yours. In my implementation. I test by center crop 1.5x scale-upped square area. Are there any other tricks to improve the classification accuracy?