TorchEnsemble-Community / Ensemble-Pytorch

A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.
https://ensemble-pytorch.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.08k stars 95 forks source link

Problems encountered when applying gradient boosting to Resnet18 on Cifar10. #77

Closed know-nothing8 closed 3 years ago

know-nothing8 commented 3 years ago

Hello, I am learning deep ensemble methods. Your work is very good and helps me a lot. But I encountered a problem when applying gradient boosting to Resnet18 on Cifar10.

When applying ensemble algorithms, such as Bagging and Fast Geometric, to Resnet18 on the Cifar10 dataset, they work normally. However, Gradient Boosting Classifier and Soft Gradient Boosting Classifier cannot improve performance through training.

After checking, I found that the latter two used the "pseudo-residual function" instead of the cross-entropy function in the classification. Is this the reason for the inability to train?

I believe I overlooked some details. Because there are some predictions of Gradient Boosting in the file ./docs/plotting/resnet_cifar10.py.

Below is a code snippet that I applied Soft Gradient Boosting to Resnet18 on Cifar10:

###code START###
model = SoftGradientBoostingClassifier(estimator=ResNet,
                                       estimator_args={"block": BasicBlock, "num_blocks": [2, 2, 2, 2]},
                                       n_estimators=n_estimators,
                                       cuda=True)

# Set the Optimizer
model.set_optimizer("SGD", lr=lr, weight_decay=weight_decay, momentum=momentum)

# Set the Scheduler
model.set_scheduler("CosineAnnealingLR", T_max=epochs)

# Train
estimator = model.fit(train_loader, epochs=epochs, test_loader=test_loader)
###code END###

This is the result returned by the console

How should I modify this code? Thank you so much for your patience ^_^

xuyxu commented 3 years ago

Hi @know-nothing8, try the following solutions, and let me know if you have any problem:

The training loss turns into nan typically means that the current configuration on the optimizer does not adapt well with the problem appropriately. Therefore, my first suggestion is to use a smaller learning rate ;-)

know-nothing8 commented 3 years ago

Thank you very much for your help. After I changed the learning rate from 1e-1 to 1e-3, it still doesn't work. But after I changed the optimizer from "SGD" to "Adam", it can be trained normally! 👍

xuyxu commented 3 years ago

If I remember correctly, the performance of ResNet should be slightly better when using the SGD optimizer. You could also try to use a smaller momentum factor.

I am going to close this issue since it is more of a problem on how to optimize a specific deep learning model. Thanks for reporting. 😄