dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.03k stars 1.32k forks source link

Can't do overfitting.... #270

Open sdimantsd opened 4 years ago

sdimantsd commented 4 years ago

Hi, im try to overfit YOLACT, but there are objects that are not regularly identified. Attached two images, the first is my segmentations, and the second is the network result. Can somwone help me please?

Im use resnet101 as me backbone, and 700x700 as input.

Thanks!

My segmentations: 5640

Network result: 5640_out

jasonkena commented 4 years ago

If I'm not mistaken, there's a maximum amount of detections (amount of anchors), that can be made at each FPN layer. So there's an intrinsic limit on the total amount of detections (the sum of anchors on all FPN layers).

Hope that helps.

sdimantsd commented 4 years ago

Thx, I thing it's work now. I think there were 2 problems:

  1. The amount of anchors as @jasonkena say.
  2. Because the frame is rectangular, the width shrink was much larger than the height shrink, so the shape of the perpendicular cars was different from the horizontal cars (see image below) 5640_1 When i add black area at the bottom of the image the result was better (Even though the number of pixels for each vehicle is smaller!) haifa_rect_out
abhigoku10 commented 4 years ago

@sdimantsd @jasonkena can you please elaborate on the first point i.e the number of anchors , i am having this issue with person detection

@sdimantsd just to understand was there any specific reason to go yolact for these drone images ?? there is one more architecture https://github.com/CosmiQ/yolt which is used for drone / satellite imaging

sdimantsd commented 4 years ago

@abhigoku10, The truth is I just didn't know this network. Thanks.

Now i need to work with YOLACT with rectangular input. It is possible? @dbolya ?

jasonkena commented 4 years ago

@sdimantsd Glad it helped @abhigoku10, here's the network diagram from the YOLACT++ paper Fig. 2 Screen Shot 2020-01-14 at 19 26 19 and the Prediction Head from Fig. 4 Screen Shot 2020-01-14 at 19 30 36

So the way the network makes predictions, is by evaluating the Prediction Head, on each of the layers of the Feature Pyramid (P3, P4, P5, P6, P7), and each of these evaluations produce a anchors, so the theoretical limit of the number of proposed detections is 5a. If I'm not mistaken, YOLACT++ uses a=9, for all possible combinations of 3 aspect ratios and 3 scales per FPN layer, effectively limiting the amount of detections to 45.

jasonkena commented 4 years ago

@sdimantsd can you please post an example evaluation of the network on the image you mentioned (the shrinked one)?

sdimantsd commented 4 years ago

@jasonkena I post it, it's the second image

sdimantsd commented 4 years ago

notice the black area at the bottom of the image

jasonkena commented 4 years ago

Sorry, I meant the one without the black pixels

sdimantsd commented 4 years ago

It's te second image on my first msg

jasonkena commented 4 years ago

How about with an increased number of anchors?

sdimantsd commented 4 years ago

The last image is with increasing the number of anchors, and shrink, the first is without. I have no image with increasing the anchors number but without black area.

dbolya commented 4 years ago

@jasonkena Haha, your analysis is almost correct, but there are 3 anchor aspect ratios, and 1 anchor scale per layer, totaling 3 anchors for each layer. And that's the number of anchors per feature location per layer, not just per layer. For instance, since P3 is 69x69, there are 69*69*3 anchors on that layer for a total of 14283 anchors for just P2 alone. In total over all layers there are 19248 anchors, so that wouldn't be the issue here.

The actual possible bottlenecks are: NMS is set to output only 100 detections max (since COCO only allows 100 detections), the input for NMS uses the top 200 detections, and when displaying you need to set the --top_k to the number of detections you want.

And yeah @sdimantsd the issue is the aspect ratio. I have non-box image support on my TODO list, but it will only work if all your images are the same aspect ratio (since training with varied aspect ratio image doesn't work, probably because protonet internally relies on some odd properties of conv layers to do its dirty work, so changing the aspect ratio screws with it).

enhany commented 4 years ago

And yeah @sdimantsd the issue is the aspect ratio. I have non-box image support on my TODO list, but it will only work if all your images are the same aspect ratio (since training with varied aspect ratio image doesn't work, probably because protonet internally relies on some odd properties of conv layers to do its dirty work, so changing the aspect ratio screws with it).

@dbolya COCO dataset have non-box images and different aspect ratio. Does this means if crop all pictures to the same size (and same aspect ratio) performance (mAP) of YOLACT will be much better?

dbolya commented 4 years ago

@enhany On the subset of cropped COCO, most likely yeah. Currently, the aspect ratio is stretched all over the place by resizing everything to 550x550 and like you said it's not consistent because of the varying aspect ratios in COCO. Maybe it'd be better to take the most common aspect ratio in COCO, and pad every image to that aspect ratio (disregarding that you'll have fewer pixels to work with if the aspect ratio is different). I might try that out later.

sdimantsd commented 4 years ago

@dbolya, Thx. Do you planning to implement non-box image soon?

dbolya commented 4 years ago

@sdimantsd It's at the top of my TODO list, but I have other projects with deadlines coming soon so it might be a while.

sdimantsd commented 4 years ago

Hi @dbolya I saw that option in config.py preserve_aspect_ratio It's look like that option "save" the width/height ratio, right? It's works? if so, this is a very good solution to my original post, since a lote of my image area is not nessesery...

dbolya commented 4 years ago

@sdimantsd preserve_aspect_ratio doesn't work very well, but I've implemented that fixed non-square aspect ratio thing I was talking about. I have to do a little bit more testing then I'll push it on Monday. Tentatively it increases mAP on cityscapes (2:1 aspect ratio) by ~1 mAP.

sdimantsd commented 4 years ago

Hi @dbolya , Anythin new with this freature?

Thanks!!!

sdimantsd commented 4 years ago

Hi @dbolya Thanks so much for all your work! Do you have any expectations of when you will be done with this feature?

sdimantsd commented 4 years ago

Hi @dbolya, Do you know when you will be able to push this update?

Ruwen14 commented 3 years ago

Hi @dbolya Any new updates on this topic?

danielcrane commented 3 years ago

@dbolya I'd also be very interested in any updates with regards to being able to train on non-square images, since 16:9 images are incredibly common these days. :smiley: