Problems encountered when applying gradient boosting to Resnet18 on Cifar10.

know-nothing8 commented 3 years ago

Hello, I am learning deep ensemble methods. Your work is very good and helps me a lot. But I encountered a problem when applying gradient boosting to Resnet18 on Cifar10.

When applying ensemble algorithms, such as Bagging and Fast Geometric, to Resnet18 on the Cifar10 dataset, they work normally. However, Gradient Boosting Classifier and Soft Gradient Boosting Classifier cannot improve performance through training.

After checking, I found that the latter two used the "pseudo-residual function" instead of the cross-entropy function in the classification. Is this the reason for the inability to train?

I believe I overlooked some details. Because there are some predictions of Gradient Boosting in the file ./docs/plotting/resnet_cifar10.py.

Below is a code snippet that I applied Soft Gradient Boosting to Resnet18 on Cifar10:

###code START###
model = SoftGradientBoostingClassifier(estimator=ResNet,
                                       estimator_args={"block": BasicBlock, "num_blocks": [2, 2, 2, 2]},
                                       n_estimators=n_estimators,
                                       cuda=True)

# Set the Optimizer
model.set_optimizer("SGD", lr=lr, weight_decay=weight_decay, momentum=momentum)

# Set the Scheduler
model.set_scheduler("CosineAnnealingLR", T_max=epochs)

# Train
estimator = model.fit(train_loader, epochs=epochs, test_loader=test_loader)
###code END###

This is the result returned by the console

INFO: Epoch: 000 | Batch: 000 | RegLoss: 1019.32739
INFO: Epoch: 000 | Batch: 100 | RegLoss: nan
INFO: Epoch: 000 | Batch: 200 | RegLoss: nan
INFO: Epoch: 000 | Batch: 300 | RegLoss: nan
INFO: Epoch: 000 | Validation Acc: 10.000 % | Historical Best: 10.000 %
INFO: Saving the model to ./SoftGradientBoostingClassifier_ResNet_3_ckpt.pth

How should I modify this code? Thank you so much for your patience ^_^

xuyxu commented 3 years ago

Hi @know-nothing8, try the following solutions, and let me know if you have any problem:

Set use_reduction_sum to False when calling the fit method
Use a smaller learning rate when calling the set_optimizer method

The training loss turns into nan typically means that the current configuration on the optimizer does not adapt well with the problem appropriately. Therefore, my first suggestion is to use a smaller learning rate ;-)

know-nothing8 commented 3 years ago

Thank you very much for your help. After I changed the learning rate from 1e-1 to 1e-3, it still doesn't work. But after I changed the optimizer from "SGD" to "Adam", it can be trained normally! 👍

xuyxu commented 3 years ago

If I remember correctly, the performance of ResNet should be slightly better when using the SGD optimizer. You could also try to use a smaller momentum factor.

I am going to close this issue since it is more of a problem on how to optimize a specific deep learning model. Thanks for reporting. 😄

TorchEnsemble-Community / Ensemble-Pytorch

Problems encountered when applying gradient boosting to Resnet18 on Cifar10. #77