dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5k stars 1.33k forks source link

resize issue #124

Open xuchengggg opened 5 years ago

xuchengggg commented 5 years ago

Hi, I saw two ways of resize in your code, one is keep the ratio and the other is just resize to 550x550. I want to know if you have trained both ways. I have trained with those two ways in tensorflow, and I got some interesting phenomenon. I use the first way to indicate the way to keep the image ratio as in faster_rcnn, and second way to indicate the other.

Firstly, in the training time, the first way can always be faster than second, 650ms/step of the first way and 860ms/step of the second trained on RTX2080Ti with batchsize of 8. And secondly, the mask_loss of first one will be higher while lower loc_loss and class_loss. When in eval time, the second way can have better mask, but a little lower confidence of objects. Are those situations correct?

dbolya commented 5 years ago

We haven't actually tested that setting due to memory constraints (it's naively implemented). But in testing with Mask R-CNN to see whether or not it's worth spending a lot of time on, we found that training with 550x550 images in Mask R-CNN was only 0.5 mAP worse than if taking the area to be 550^2 pixels but keeping the aspect ratio. The 550x550 images was indeed faster to train and evaluate than the keep aspect ratio one, so we decided to not bother with maintaining aspect ratio.

I'm curious, what performance improvement did you observe?

xuchengggg commented 5 years ago

In my code , keep aspect ratio one is faster to train, but I haven't to test the eval time. I'm not sure is there any problem with my code, since I found it's really slow to train 550x550 without keep aspect ratio when i did not use random crop. And I found that in some test pictures, the one keep aspect ratio can always have higher confidence, but worse mask. I can pass you a few pictures to show that results. But I use resnet50 and trained it for 400000 steps, it may not be fully trained.

xuchengggg commented 5 years ago

微信图片_20190827195017 微信图片_20190827195007

微信图片_20190827195030 微信图片_20190827195022

First two are the results from your model with weights yolact_base_54_800000.pth, and the other are the results from my code which resize inputs keep the aspect ratio.

shoutOutYangJie commented 5 years ago

hello, where does your tensorflow code from?

xuchengggg commented 5 years ago

hello, where does your tensorflow code from?

By myself, but still have not got good results.

dbolya commented 5 years ago

Oh sorry I read your original comment wrong. Yeah, that's weird. Resizing without maintaining aspect ratio should be faster.

Actually, one question: when you resize keeping aspect ratio, do you ensure that there's the same number of pixels as when you don't? If the keep aspect ratio one has fewer pixels, that might explain this.

xuchengggg commented 5 years ago

I use the same strategy as mask_rcnn, and set the max_size to 550, min_size to 400, and then padded to 550x550. For just resize, firstly image=dataset.load_image(image_id) mask, class_ids=dataset.load_mask(image_id) and then use flip augmentation, and get bbox by mask bbox = extract_bboxes(mask) and finally use the SSDAugmentation in your code, but just use random_crop and resize. Is there any problem?

And I suspect that there is a possibility that images input by the method of keep aspect ratio are filled with 0. For some kernels, the computational complexity is reduced, and the size of the input is reduced in disguise. This is just guess.

dbolya commented 5 years ago

Hmm that's interesting. Perhaps it's much faster to compute on a 0 input than not, but that would be a weird optimization for CuDNN to make (though not ruling it out). Maybe it just has to do with fewer detections?

Then, I would have thought doing keep aspect ratio in that way would add way too many 0's and thus perform worse. Actually, do you have the mAP numbers for both methods? You can try using COCOEval so you don't have to implement it yourself. This would be a much better indicator as to what's the better method than just looking at the results.

xuchengggg commented 5 years ago

OK. I will test the mAP next. But as you can see, the pictures above, the results of my implementation have a serious leakage. There may be those reasoons: (1) I use resnet50, maybe resnet101 can get better results. (2) For resize the image keep aspect ratio, in eval time, when get the final mask with 138x138, it still need to do the crop, and then resize to 550x550, maybe this would cause loss of information? (3) There are some thing wrong in my code. I really don't how to fix this problem.

dbolya commented 5 years ago

Hmm yeah you're kinda right in that these masks look way too rough to be due to using only 400k iterations or anything like that. I know even our half trained models had really great masks.

One thing I'm confused: which one are you saying is the keep aspect ratio one (in the images up there)?

For (1), resnet50 have less fine masks, but they're not that bad. Though I haven't tried resnet50 half trained, admittedly.

Then for (2), Depending on the aspect ratio of the original image, this could be a problem (if the original image was, say, too wide), but the artifacts that it would cause would look much more obvious than this.

xuchengggg commented 5 years ago

The third and fourth. I use the pre_trained weights of resnet50 and then train all the layers. Do I need to adjust my training strategy? Or is there some tricks I didn't notice? If you can have time to look at my code, that would be really great.

dbolya commented 5 years ago

Sadly, I don't think I would have time to look at your code (especially if this is just a small bug). I will say though, you might want to try to convert my pytorch weights to tensorflow and see if you can load my weights in your code and get the same results. Of course, that would only be for the 550x550 version, but if you have any bugs in the forward pass, that should reveal it (one thing to note, is you'll have to also reproduce my square anchor bug--see yolact.py in the generate anchor box code the prediction head.

And you mean you load resnet50 pretrained weights and don't freeze any layer during training (including batch norm)? Because that's how I do it.

xuchengggg commented 5 years ago

Thank you, I will try to convert pytorch weights to tensorflow. And you mean I should freeze all the batch norm layers when load resnet 50 pretrained weights?

dbolya commented 5 years ago

Oh sorry, I meant I don't freeze batch norm.

xuchengggg commented 5 years ago

Ok, thanks, I will check my code then.

sdimantsd commented 4 years ago

hello, where does your tensorflow code from?

By myself, but still have not got good results.

Sup? Anything new with the Yolact on tensorflow?

Shubham3101 commented 4 years ago

@xuchengggg Hi, Can you share your implementation(tensorflow) of yolact?