dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.01k stars 1.32k forks source link

FPN #173

Open zhangbaoj opened 4 years ago

zhangbaoj commented 4 years ago

Hello, what is the meaning of select layers:list(1,4) in config.py, if I want to reduce the number of layers in fpn, what should I do?

dbolya commented 4 years ago

Depends on which layers you want to remove:

If you wanted to remove the lower layers (P3, P4, ...), remove that index from selected_layers (P3 is 1, P4 is 2, and P5 is 3). Then decrease the 5 in pred_aspect_ratios, and remove the scale for that layer in pred_scales.

If you wanted to remove one of the extra layers (P6, P7), in fpn, decrease num_downsample and do the same for pred_aspect_ratios and pred_scales above (atm you can't remove P6 without removing P7).

abhigoku10 commented 4 years ago

@dbolya if i remove P3 what all changes have to been made wrt to pred_aspect_ratios and pred_scales . can remove P6 and keep P7

zhangbaoj commented 4 years ago

Depends on which layers you want to remove:

If you wanted to remove the lower layers (P3, P4, ...), remove that index from selected_layers (P3 is 1, P4 is 2, and P5 is 3). Then decrease the 5 in pred_aspect_ratios, and remove the scale for that layer in pred_scales.

If you wanted to remove one of the extra layers (P6, P7), in fpn, decrease num_downsample and do the same for pred_aspect_ratios and pred_scales above (atm you can't remove P6 without removing P7).

Thank you for your answer, but I still have a question. Since there are five layers in the FPN: P3(1), P4(2), P5(3), P6(4), and P7(5). Both res101 and res50 in the code are List(range(1,4)), The darknet53 is List(range(2,5)). This is not to say that the FPN selected by res101 and res50 in the code is P3(1), P4(2), P5(3), P6(4). The FPN selected by darknet 53 is P4(2), P5(3), P6(4) and P7(5). If you only need to remove P6(4) and P7(5) in FPN, do you need to change selected_layers?

dbolya commented 4 years ago

@abhigoku10 To remove P3, remove the first element from every list you mentioned. You can't currently remove P6 and keep P7 as I've said--take a look at the code to see how you can fix that if you want.

@zhangbaoj Darknet has an extra layer, since conv1 for Darknet is a standard conv while for resnet is a max-pooled behemoth. Thus for Darknet the P3-equivalent is 2, the P4-equivalent is 3, etc. (i.e, both the darknet and resnet variants have the same number of anchors). The process is the same for Resnet just with that index shifted one over.

If you want to just remove P6 and P7, simply set num_downsample in fpn to 0. You don't need to make any other changes.

zhangbaoj commented 4 years ago

@dbolya thank you for your reply. In other words, your source code res101 uses P3 to P6 and does not include P7. Am i right?

dbolya commented 4 years ago

We do use P7 note that the down sample layers are not included in selected layers-- they're added on afterward. We use P3-P7.

zhangbaoj commented 4 years ago

thank you for your reply. I still have a question. I know that the ratio of anchors is set to 1:1, 1:2, and 2:1 in your code. What should I do if I want to change this scale?

dbolya commented 4 years ago

You can change the aspect ratios here: https://github.com/dbolya/yolact/blob/bd51ec9ef42389f60178fe729dfda3ad0f370f2f/data/config.py#L603 (should be fairly self explanatory, and ar = w / h).

To change the scales, modify the line below that (I don't think this is what you were asking for but just in case): https://github.com/dbolya/yolact/blob/bd51ec9ef42389f60178fe729dfda3ad0f370f2f/data/config.py#L604 These scales are in pixels (boxes will have area scale^2) and each corresponds to an individual P layer that gets used.

zhangbaoj commented 4 years ago

This is my training result. Does this prove that my map is only 4.96%? image

dbolya commented 4 years ago

Yeah that means your mAP is 4.96%, which would indicate that something's wrong with your configuration or that you didn't train long enough.

zhangbaoj commented 4 years ago

I have solved the above map problem with your help. thank you for your help. The aspect ratios set in your code are 1:1, 1:2, and 2:1. If I want to accurately test my data set, what method should I use to change it? Is it through kmeans?

dbolya commented 4 years ago

Yeah, you can do kmeans with k=3 on all the aspect ratios in your dataset.

I don't recommend doing that for the scales, since FPN kind of needs each scale to be 2x the last, but you can check the mean scale by getting the mean of sqrt(w * h) and then centering the scales around that mean.

zhangbaoj commented 4 years ago

Why do you choose to generate P6 and P7 by P5 downsampling, instead of getting P7 directly from the same as other algorithms, and then generating P6, P5, P4, P3, P2, P1 by P7 upsampling?

dbolya commented 4 years ago

We follow the approach of RetinaNet, which does it this way. I think the way you're proposing would work too, but it's not super clear which is better, since C1-5 are pretrained but if you add C6 and C7 (then generate P7 from C7 as P5 is generated from C5 right now), those layers would be untrained. Not to mention, it would be slower since the now P3 for instance depends on P6 and P7 which was otherwise not the case.

It seems worth a try though. I'll see if I can test it when I get back from ICCV.

zhangbaoj commented 4 years ago

Thank you very much for your answer. I tried to directly generate C6 and C7 in the backbone, and then generate P6 and P5 through C7 upsampling. But it seems that the operation failed. Can you give me some help? How should I modify the code to achieve my purpose?

dbolya commented 4 years ago

I think you could do that by setting num_downsample to 0, then changing selected_layers to be list(range(1, 6)). Haven't tested that though.

zhangbaoj commented 4 years ago

thank you very much. I use resnet101. In order to generate C6 and C7, should I add two layers of self._make_layer to the backbone.py first, but the number of channels in the self._make_layer I increased is still 512, because I think the number will increase the calculation amount. But I plan to directly generate C7 (P7) and then use upsampling and concat to generate P6 and P5. In my program, I only want to keep P5, P6 and P7 in FPN. So should I num_downsample = 0, then changing selected_layers to be list (range (5, 6))?

dbolya commented 4 years ago

Yeah my code automatically does that (since we used to use SSD). The indices start at C2=0, confusingly enough (since C0 [i.e., the image] and C1 are too big to output). So if you want P5, P6, and P7, you can go range(3, 6).

zhangbaoj commented 4 years ago

If I want to add the measured distance to the text content of the detected image, where should I add it? I haven't found it for a long time, so I ask your help again, thank you such as: motorcycle:0.99(+distance:10m)

dbolya commented 4 years ago

A bit late, but here: https://github.com/dbolya/yolact/blob/c508c2560eac863da63f6306106e3e3375498bd8/eval.py#L247

abhigoku10 commented 4 years ago

@zhangbaoj @dbolya r we getting the distance value of the detected object also from yolact ?