chenhaoxing / DiffusionInst

This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).
Apache License 2.0
222 stars 14 forks source link

How to use only three features? #8

Closed leftthomas closed 1 year ago

leftthomas commented 1 year ago

I have tried to change theses lines https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/configs/Base-DiffusionInst.yaml#L9-L13 to

   OUT_FEATURES: ["res3", "res4", "res5"] 
 FPN: 
   IN_FEATURES: ["res3", "res4", "res5"] 
 ROI_HEADS: 
   IN_FEATURES: ["p3", "p4", "p5"] 

and after training, the mask mAP will be crashed, only 0.02, how to make sure only three features are correctly used?

zhangxgu commented 1 year ago

@leftthomas Hi, if the detection metric is OK then you have to change the code of mask features. See head.py line 141-163.

leftthomas commented 1 year ago

@zhangxgu according to the codes: https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L142-L163 self.mask_head contains 4 conv_blocks and 1 Conv2d layer, theses 5 modules are serially connected, and according the following code: https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L236 the input mask_feat will be processed by self.mask_head serially, so I don't understand why modifying L142-L163 can solve the problem of feature numbers.

leftthomas commented 1 year ago

Another problem is that according to the following codes: https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L126-L140 self.mask_refine contains 3 conv_blocks after initialization. In the forward process, according to the following codes: https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L223-L235 self.mask_refine[0] is used to process features[0], which corresponds to p2; self.mask_refine[1] is used to process features[1], which corresponds to p3; self.mask_refine[2] is used to process features[2], which corresponds to p4; I am very confused why p5 is not been processed?

zhangxgu commented 1 year ago

@leftthomas Yes, what I mean is you have to change the mask head architecture when reducing one FPN output feature map. So you can both modify the architecture of mask head (for reducing one feature map) and then change the number of loops in line 223 to 235. As for the usage of specific feature maps(not using p5), we follow the code of CondInst.

leftthomas commented 1 year ago

I have tried to change theses lines https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/configs/Base-DiffusionInst.yaml#L9-L13 to

   OUT_FEATURES: ["res3", "res4", "res5"] 
 FPN: 
   IN_FEATURES: ["res3", "res4", "res5"] 
 ROI_HEADS: 
   IN_FEATURES: ["p3", "p4", "p5"] 

and modified https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L126-L140 to

 self.mask_refine = nn.ModuleList() 
 in_features = [ 'p4', 'p5'] 
 for in_feature in in_features: 
     conv_block = [] 
     conv_block.append( 
         nn.Conv2d(d_model, 
                   128, 
                   kernel_size=3, 
                   stride=1, 
                   padding=1, 
                   bias=False)) 
     conv_block.append(nn.BatchNorm2d(128)) 
     conv_block.append(nn.ReLU()) 
     conv_block = nn.Sequential(*conv_block) 
     self.mask_refine.append(conv_block) 

and modified https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L223-L235 to

for i, (x) in enumerate(features): 
     if i == 0: 
         mask_feat = self.mask_refine[i](x) 
     elif i <= 1: 
         x_p = self.mask_refine[i](x) 
         target_h, target_w = mask_feat.size()[2:] 
         h, w = x_p.size()[2:] 
         assert target_h % h == 0 
         assert target_w % w == 0 
         factor_h, factor_w = target_h // h, target_w // w 
         assert factor_h == factor_w 
         x_p = aligned_bilinear(x_p, factor_h) 
         mask_feat = mask_feat + x_p 

it is still not working, can you provide a feasible solution?

zhangxgu commented 1 year ago

@leftthomas Maybe is the size of mask feature? In original code, the mask feature size is 1/4, when using your setting, the size is 1/8. There are two parts of code maybe need carefully check. One is in detector.py, line 110-122 for mapping 1/4 mask to original ones. The other is in loss.py, line 368-373 for gt mask mapping to 1/4 masks. Hope this can help. By the way, you can use visualization code demo.py to visualize some masks for debug. It's helpful to me.

leftthomas commented 1 year ago

@zhangxgu Thank you for your kind answer.

leftthomas commented 1 year ago

There is another problem of GPU memory leakage. With the training, the GPU memory will continue to increase.

zhangxgu commented 1 year ago

Hi, I have not met this problem before. If you use bs=32 as config, then you need A100-80G for training.

leftthomas commented 1 year ago

The memory will increase around 100MB about every 10000 iters, you could try to see this problem.