Closed leftthomas closed 1 year ago
@leftthomas Hi, if the detection metric is OK then you have to change the code of mask features. See head.py line 141-163.
@zhangxgu according to the codes:
https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L142-L163
self.mask_head
contains 4 conv_blocks and 1 Conv2d layer, theses 5 modules are serially connected, and according the following code:
https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L236
the input mask_feat
will be processed by self.mask_head
serially, so I don't understand why modifying L142-L163 can solve the problem of feature numbers.
Another problem is that according to the following codes:
https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L126-L140
self.mask_refine
contains 3 conv_block
s after initialization. In the forward process, according to the following codes:
https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L223-L235
self.mask_refine[0]
is used to process features[0]
, which corresponds to p2
;
self.mask_refine[1]
is used to process features[1]
, which corresponds to p3
;
self.mask_refine[2]
is used to process features[2]
, which corresponds to p4
;
I am very confused why p5
is not been processed?
@leftthomas Yes, what I mean is you have to change the mask head architecture when reducing one FPN output feature map. So you can both modify the architecture of mask head (for reducing one feature map) and then change the number of loops in line 223 to 235. As for the usage of specific feature maps(not using p5), we follow the code of CondInst.
I have tried to change theses lines https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/configs/Base-DiffusionInst.yaml#L9-L13 to
OUT_FEATURES: ["res3", "res4", "res5"]
FPN:
IN_FEATURES: ["res3", "res4", "res5"]
ROI_HEADS:
IN_FEATURES: ["p3", "p4", "p5"]
and modified https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L126-L140 to
self.mask_refine = nn.ModuleList()
in_features = [ 'p4', 'p5']
for in_feature in in_features:
conv_block = []
conv_block.append(
nn.Conv2d(d_model,
128,
kernel_size=3,
stride=1,
padding=1,
bias=False))
conv_block.append(nn.BatchNorm2d(128))
conv_block.append(nn.ReLU())
conv_block = nn.Sequential(*conv_block)
self.mask_refine.append(conv_block)
and modified https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/diffusioninst/head.py#L223-L235 to
for i, (x) in enumerate(features):
if i == 0:
mask_feat = self.mask_refine[i](x)
elif i <= 1:
x_p = self.mask_refine[i](x)
target_h, target_w = mask_feat.size()[2:]
h, w = x_p.size()[2:]
assert target_h % h == 0
assert target_w % w == 0
factor_h, factor_w = target_h // h, target_w // w
assert factor_h == factor_w
x_p = aligned_bilinear(x_p, factor_h)
mask_feat = mask_feat + x_p
it is still not working, can you provide a feasible solution?
@leftthomas Maybe is the size of mask feature? In original code, the mask feature size is 1/4, when using your setting, the size is 1/8. There are two parts of code maybe need carefully check. One is in detector.py, line 110-122 for mapping 1/4 mask to original ones. The other is in loss.py, line 368-373 for gt mask mapping to 1/4 masks. Hope this can help. By the way, you can use visualization code demo.py to visualize some masks for debug. It's helpful to me.
@zhangxgu Thank you for your kind answer.
There is another problem of GPU memory leakage. With the training, the GPU memory will continue to increase.
Hi, I have not met this problem before. If you use bs=32 as config, then you need A100-80G for training.
The memory will increase around 100MB about every 10000 iters, you could try to see this problem.
I have tried to change theses lines https://github.com/chenhaoxing/DiffusionInst/blob/00af4907fbb38d7c86ee83975b14dd843200d95e/configs/Base-DiffusionInst.yaml#L9-L13 to
and after training, the mask mAP will be crashed, only 0.02, how to make sure only three features are correctly used?