garrickbrazil / M3D-RPN

MIT License
261 stars 67 forks source link

How is the performace when using resnet( 18 or 50) #11

Open kl456123 opened 5 years ago

kl456123 commented 5 years ago

I just want to know which backbone is good enough to train for a mono 3d prediction task.

garrickbrazil commented 5 years ago

We have not tried Resnet in a long time. If I recall the performance is slightly less than with DenseNet. I can plan to add a few more backbone models to the repository soon.

Otherwise feel free to extend them yourself using any backbones offered by torchvision, which primarily involves changing the final pooling layer and optionally dilating the network. Squeezenet, Shufflenet, and MNAS would all be interesting in addition to resnet but may require a few more minor tweaks.

qaazii commented 5 years ago

I just want to know which backbone is good enough to train for a mono 3d prediction task.

I tried ResNet18 34, 50 and 101.. the best one is 101 but still the results are lower than densenet and you can not use the whole network because the network stride is 16 you feature size cannot exceed 1024.

garrickbrazil commented 5 years ago

I do not have time currently to test the models. But you can feel free to adjust the stride of ResNet backbones such that the final stride is still 16. Then you can use all the layers and utilize the proper input channel into "prop_feats". I typically change the last few resnet blocks for this. For example:

Put this in the init:

self.conv1 = base.conv1
self.bn1 = base.bn1
self.relu = base.relu
self.maxpool = base.maxpool
self.layer1 = base.layer1
self.layer2 = base.layer2
self.layer3 = base.layer3
self.layer4 = base.layer4
self.layer4[0].downsample[0].stride = (1, 1)
self.layer4[0].conv1.stride = (1, 1)

self.prop_feats = nn.Sequential(
            nn.Conv2d(self.layer4[-1].conv3.out_channels, 512, 3, padding=1),
            nn.ReLU(inplace=True)
        )

Put this in the forward:

x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

Put this in the build:

resnet18 = models.resnet18(pretrained=train)
rpn_net = RPN(phase, resnet18, conf)

I have not tested this but something similar to this should work for at least for resnet18 and resnet50.

garrickbrazil commented 5 years ago

I've attached examples for each ResNet18 and ResNet50 here. However, I have NOT tested them :)

qaazii commented 5 years ago

Thank you so much Garrick Brazil for commenting and making the code. I am still having the same error which i faced when i try to use layer4 of resnet50 model..

RuntimeError: The size of tensor a (55) must match the size of tensor b (110) at non-singleton dimension 3

qaazii commented 5 years ago

well, after training with layer3 with the code you provided following are the results.. test_iter 100000 2d car --> easy: 0.0004, mod: 0.0008, hard: 0.0013 test_iter 100000 gr car --> easy: 0.0002, mod: 0.0001, hard: 0.0001 test_iter 100000 3d car --> easy: 0.0002, mod: 0.0001, hard: 0.0001 test_iter 100000 2d pedestrian --> easy: 0.0070, mod: 0.0070, hard: 0.0070 test_iter 100000 gr pedestrian --> easy: 0.0003, mod: 0.0003, hard: 0.0003 test_iter 100000 3d pedestrian --> easy: 0.0003, mod: 0.0003, hard: 0.0003 test_iter 100000 2d cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000 test_iter 100000 gr cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000 test_iter 100000 3d cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000

now i changed the code and i used it in the dilate file using layer gives reasonable results..

garrickbrazil commented 5 years ago

Attached are fixed model files (I briefly check that they begin training on my machine).

resnet_models_fixed.tar.gz

Regarding the performance you posted, that is very alarming. If the above models converge to an low training loss but have extremely poor validation, then it may be necessary to adjust the batch norm momentum. Either slow it or halt it entirely.

You can accomplish this by using either of the below functions. I do NOT recommend doing this unless you observe a major generalization issue between training loss and validation. Freezing these layers is not ideal unless the batches are too unstable.

def freeze_bn(network):

    for name, module in network.named_modules():
        if isinstance(module, torch.nn.BatchNorm2d):
            module.eval()

def slow_bn(network, val=0.01):

    for name, module in network.named_modules():
        if isinstance(module, torch.nn.BatchNorm2d):
            module.momentum = val

Then you must add the function call (e.g., "slow_bn(rpn_net)") in train_rpn_3d.py at line 104 before beginning training AND later on around line 192 after validation finishes each cycle.

qaazii commented 5 years ago

when i use the model resnet101 with depthaware i get the following results still reasonable test_iter 50000 2d car --> easy: 0.8680, mod: 0.8214, hard: 0.6647 test_iter 50000 gr car --> easy: 0.1937, mod: 0.1634, hard: 0.1385 test_iter 50000 3d car --> easy: 0.1265, mod: 0.1138, hard: 0.1003 test_iter 50000 2d pedestrian --> easy: 0.6363, mod: 0.5651, hard: 0.4853 test_iter 50000 gr pedestrian --> easy: 0.0533, mod: 0.0521, hard: 0.0460 test_iter 50000 3d pedestrian --> easy: 0.0467, mod: 0.0461, hard: 0.0430 test_iter 50000 2d cyclist --> easy: 0.6447, mod: 0.4094, hard: 0.4014 test_iter 50000 gr cyclist --> easy: 0.0422, mod: 0.0258, hard: 0.0261 test_iter 50000 3d cyclist --> easy: 0.0265, mod: 0.0215, hard: 0.0132 .. let me try with your code and changing batch normalization..

qaazii commented 5 years ago

Attached are fixed model files (I briefly check that they begin training on my machine).

resnet_models_fixed.tar.gz

Regarding the performance you posted, that is very alarming. If the above models converge to an low training loss but have extremely poor validation, then it may be necessary to adjust the batch norm momentum. Either slow it or halt it entirely.

You can accomplish this by using either of the below functions. I do NOT recommend doing this unless you observe a major generalization issue between training loss and validation. Freezing these layers is not ideal unless the batches are too unstable.

def freeze_bn(network):

    for name, module in network.named_modules():
        if isinstance(module, torch.nn.BatchNorm2d):
            module.eval()

def slow_bn(network, val=0.01):

    for name, module in network.named_modules():
        if isinstance(module, torch.nn.BatchNorm2d):
            module.momentum = val

Then you must add the function call (e.g., "slow_bn(rpn_net)") in train_rpn_3d.py at line 104 before beginning training AND later on around line 192 after validation finishes each cycle.

I am very thankful to you for helping. you did a great job.

ziniuwang commented 2 years ago

I've attached examples for each ResNet18 and ResNet50 here. However, I have NOT tested them :)

Thanks for your great work! i have some problems about changing densenet121.i hope you can give me some advice.when i change the backbone from densenet121.features to resnet50 until layer4 and i change the self.base[-1].num_features in prop_feats to 2048 and i change the lr from 0.004 to 0.0005 and batch from 2 to 8 otherwise the training can't continue.But i got the results like this [INFO]: 2021-12-10 22:11:39,653 test_iter 50000 2d car --> easy: 0.0007, mod: 0.0009, hard: 0.0012 [INFO]: 2021-12-10 22:11:39,654 test_iter 50000 gr car --> easy: 0.0002, mod: 0.0005, hard: 0.0005 [INFO]: 2021-12-10 22:11:39,655 test_iter 50000 3d car --> easy: 0.0001, mod: 0.0005, hard: 0.0005 [INFO]: 2021-12-10 22:11:39,656 test_iter 50000 2d pedestrian --> easy: 0.0152, mod: 0.0455, hard: 0.0455 [INFO]: 2021-12-10 22:11:39,656 test_iter 50000 gr pedestrian --> easy: 0.0012, mod: 0.0012, hard: 0.0012 [INFO]: 2021-12-10 22:11:39,657 test_iter 50000 3d pedestrian --> easy: 0.0012, mod: 0.0012, hard: 0.0012 [INFO]: 2021-12-10 22:11:39,658 test_iter 50000 2d cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000 [INFO]: 2021-12-10 22:11:39,658 test_iter 50000 gr cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000 [INFO]: 2021-12-10 22:11:39,659 test_iter 50000 3d cyclist --> easy: 0.0000, mod: 0.0000, hard: 0.0000 could you please give me some help?

ziniuwang commented 2 years ago

I've attached examples for each ResNet18 and ResNet50 here. However, I have NOT tested them :)

i use the code provided by you.But when i run the train_rpn_3d.py ,it shows like this iter: 250, acc (bg: 0.99, fg: 0.01, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.34, eta: 4.7h iter: 500, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.31, eta: 4.3h iter: 750, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.30, eta: 4.1h iter: 1000, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.29, eta: 4.0h iter: 1250, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.28, eta: 3.8h iter: 1500, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.30, eta: 4.0h iter: 1750, acc (bg: 1.00, fg: 0.00, iou: nan), loss (bbox_3d: nan, cls: nan, iou: nan), misc (ry: nan, z: nan), dt: 0.29, eta: 3.9h could you please give me somoe help?