Some questions about CondInst.

lartpang commented 4 years ago

I'm interested in this work, but I don’t know much about the task of instance segmentation, and I don't have a very clear understanding of the following questions:

How to determine the actual number of instances in the mask during training and testing?
How to map the generated parameters to specific instance?
How to match the category specified by the classification result with the prediction of the single instance generated by the mask head?
How to deal with multiple instances of the same category?
Must the total number of instances be fixed? Is it possible to deal with the task of instance segmentation without distinguishing categories, like video object segmentation?

In addition, this code base is really great, but it lacks very detailed and effective comments. It would be better if you could provide comments about the shape and internal data structure of variables.

tianzhi0549 commented 4 years ago

Thank you for your questions. 1) During testing, the number of instances is determined by the underlying object detector, like other instance segmentation methods such as Mask R-CNN. The difference is that we produce a stack of filters for an instance instead of a box.

During training, the number is the number of positive samples in FCOS (which might be greater than the actual number of instances).

2) Please refer to the code from https://github.com/aim-uofa/AdelaiDet/blob/6c41f25ee7e279c4f8027a6016e00614994500c6/adet/modeling/condinst/dynamic_mask_head.py#L112.

3) They are naturally associated. Please refer to the code for details.

4) Different instances (even with the same category) are handled by different filters.

5) No, the number of instances is not fixed. It is possible to be used in video object segmentation.

We will try our best to add some comments. Thank you for pointing it out.

lartpang commented 4 years ago

@tianzhi0549 Thank you for your reply. I will read the code further. If I encounter a problem again, I will continue to ask again.

aim-uofa / AdelaiDet

Some questions about CondInst. #158

Must the total number of instances be fixed? Is it possible to deal with the task of instance segmentation without distinguishing categories, like video object segmentation?