Open breznak opened 4 years ago
About non-rectangular images, very interesting findings in https://github.com/dbolya/yolact/issues/270#issuecomment-574133928 for non-box
images which suggest:
I've done the black padding before and it's not that desirable for images with very variable image size like COCO, since you lose a lot of pixels that way. In #270 he was specifically trying to overfit onto one image, so it's not like the network needed that many pixels to be able to classify anyway.
For non square images right now, you can try adding that black pixel border, but the better implementation that I have on the TODO list is to just have everything a fixed non-square aspect ratio. Note that I can't change the size of the image arbitrarily while training because of the way the prototypes create masks (the features expect a consistent image size, so the size has to be fixed at the start).
As for whether the changes to the scales should be done automatically, I don't think so. If you look at the im400 and im700 configs you can see what changes are necessary there and they're quite simple to extrapolate your config to those changes. I don't want to be touching the scales automatically because what scales you want depends on your dataset, since some datasets tend to have bigger objects and others tend to have smaller ones.
I have on the TODO list is to just have everything a fixed non-square aspect ratio.
this would be great. I think typically the images come from one source (camera, same dataset) so they are the same size & aspect ratio. Just not square in most cases.
As for whether the changes to the scales should be done automatically,[...] at the im400 and im700 configs you can see what changes are necessary there and they're quite simple to extrapolate
'max_size': 400,
'backbone': yolact_base_config.backbone.copy({
'pred_scales': [[int(x[0] / yolact_base_config.max_size * 400)] for x in yolact_base_config.backbone.pred_scales],
Looking at the code above, yes, I think this is already "quite automated", I just think this could be also the case for base/resnet (?).
max_size
is max_input_size ? rename to that? (users can modify this) 400
is native backbone resolution? So add config.backbone.native_input_size = 400
? I don't want to be touching the scales automatically because what scales you want depends on your dataset, since some datasets tend to have bigger objects and others tend to have smaller ones.
I'd welcome some comment in the config enlightening this topic.
bbox area vs whole image size
? size_of_objects
(say 1.0 == normal, < == small?, > == larger?) If we agree on sth, I can make a PR based on your comments for this topic.
Just an update on tuning the sizes, scales. We used smaller images, 320x240
, and adjusted max_size
accordingly, Resnet backbone was 550
. Pred_scales were updated accordingly to
'pred_scales': [[int(x[0] / yolact_base_config.max_size * 400)] for x in yolact_base_config.backbone.pred_scales],
.
Unfortunately,
max_size
were terrible (CNN was unable to learn properly), max_size=550
then results are good again. I don't know if
pred_scales
tuning is broken, or if another param needs be fitted? (learning rate, batch size,..?) max_size
tuning might work eventually, but we need bigger images? (640x480 ok, or in general the img should be >= backbone size?) EDIT: don't pick on the 550/400 (we used 550 backbone, but I copied here code with 400 as example)
Did you retrain with the smaller max_size
? The model should be evaluated with the same scale as it was trained on otherwise all hell breaks loose.
And if you retrained, we also found im400 to be a little lacking, so I'd expect 300 to be much more lacking. I mentioned this in the paper, saying that perhaps instance segmentation just needs more pixels to classify, so blowing up even 320x240 images to 550x550 is necessary.
Did you retrain with the smaller max_size? The model should be evaluated with the same scale as it was trained on otherwise all hell breaks loose.
we retrained the model (=yolact) with max_size=320
, but did not retrain the backbone, if that's what you mean? Backbone still expects 550x550.
So this was completely wrong?
Images for eval
were also 320x240, I'd expect the config would rescale those to max_size as well? Or is a special action needed?
I mentioned this in the paper, saying that perhaps instance segmentation just needs more pixels to classify, so blowing up even 320x240 images to 550x550 is necessary.
yep, I remembered this. Now I'm testing if scaling down from larger (640x480) max_size=640
to
550x550 improves, or works worse too?
we retrained the model (=yolact) with max_size=320, but did not retrain the backbone, if that's what you mean? Backbone still expects 550x550. So this was completely wrong?
The backbone is trained as well while you train YOLACT, so that shouldn't be an issue. I guess the problem just was 320x320 is too small.
Images for eval were also 320x240, I'd expect the config would rescale those to max_size as well? Or is a special action needed?
Those will be rescaled to 320x320 automatically, yeah.
yep, I remembered this. Now I'm testing if scaling down from larger (640x480) max_size=640 to 550x550 improves, or works worse too?
max_size=640 should work better than 550. We tried 600 and it was better than 550, but we decided against it because it was to much slower for what little performance was gained.
The backbone is trained as well while you train YOLACT, so that shouldn't be an issue.
max_size=640 should work better than 550. We tried 600 and it was better than 550, but we decided against it because it was to much slower for what little performance was gained.
Seems I've misunderstood the max_size
param.
max_size
specifies backbone-size. So with max_size=640. backbone is no longer 550x550, but 640x640. Is that correct? it was to much slower for what little performance was gained.
would be interesting to have the improvement (depends on dataset) vs train time tradeoff.
What I understand you're saying (?): max_size specifies backbone-size. So with max_size=640. backbone is no longer 550x550, but 640x640. Is that correct?
It is a "pre-processing" step, but the size of the input image determines the size of the backbone layers, since P2 for instance is input size // 2, P3 is input size // 4, etc. The issue with change after training is that the weights were trained expecting a certain image size, so they probably won't work on a different image size.
The comment https://github.com/dbolya/yolact/issues/242#issuecomment-562907739 mentions we should adjust
pred_scales
/setmax_size
if our image is not 550x550 (backbone input size), mostly to avoid upscaling.max_size
is set? )Thank you