TorchEnsemble-Community / Ensemble-Pytorch

A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.
https://ensemble-pytorch.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.09k stars 95 forks source link

Added LBFGS optimizer for Fusion #81

Open e-eight opened 3 years ago

e-eight commented 3 years ago

resolves #79

xuyxu commented 3 years ago

Hi @e-eight, once again, thanks for the PR :-)

It looks like the CI failed, could you check what's wrong with your code according to the details? Besides, you can run pytest locally with the command pytest ./ in the root dir of Ensemble-PyTorch.

xuyxu commented 3 years ago

Thanks for your prompt fix, I will validate the performance of fusion based on your code ;-)

xuyxu commented 3 years ago

Hi @e-eight, the result looks good, the performance of fusion is the same as the master branch when using optimizers like Adam and SGD. Since I am not quite familiar with the LBFGS optimizer, could you provide a scenario where the LBFGS optimizer works better, including the definition of the model, the dataset, and hyper-parameters on the optimizer? Thanks!

e-eight commented 3 years ago

Hi! I have used the LBFGS optimizer for small datasets, but always with a single model, never with an ensemble (which is why I wanted to use this package in the first place :slightly_smiling_face:). Here is a simple example of polynomial regression where the LBFGS optimizer gives much faster convergence than Adam or SGD.

Model:

class PolynomialModel(nn.Module):
    def __init__(self, degree):
        super().__init__()
        self._degree = degree
        self.linear = nn.Linear(self._degree, 1)

    def forward(self, x):
        return self.linear(self._polynomial_features(x))

    def _polynomial_features(self, x):
        x = x.unsqueeze(1)
        return torch.cat([x ** i for i in range(1, self._degree + 1)], 1)

model = PolynomialModel(degree=3)

Data:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float
x = torch.linspace(-math.pi, math.pi, steps=20, dtype=dtype, device=device)
y = x ** 3 + 2 * x ** 2 - 3 * x + 5

Optimizer:

optimizer = LBFGS(model.parameters(), history_size=10, max_iter=4)

Hope this helps!

xuyxu commented 3 years ago

According to your experience, does the fusion ensemble help to achieve better fitting results on this task?

Hi! I have used the LBFGS optimizer for small datasets, but always with a single model, never with an ensemble (which is why I wanted to use this package in the first place 🙂). Here is a simple example of polynomial regression where the LBFGS optimizer gives much faster convergence than Adam or SGD.

Model:

class PolynomialModel(nn.Module):
    def __init__(self, degree):
        super().__init__()
        self._degree = degree
        self.linear = nn.Linear(self._degree, 1)

    def forward(self, x):
        return self.linear(self._polynomial_features(x))

    def _polynomial_features(self, x):
        x = x.unsqueeze(1)
        return torch.cat([x ** i for i in range(1, self._degree + 1)], 1)

model = PolynomialModel(degree=3)

Data:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float
x = torch.linspace(-math.pi, math.pi, steps=20, dtype=dtype, device=device)
y = x ** 3 + 2 * x ** 2 - 3 * x + 5

Optimizer:

optimizer = LBFGS(model.parameters(), history_size=10, max_iter=4)

Hope this helps!

xuyxu commented 3 years ago

Hi @e-eight, wondering that if there is any problem since you have not replied in 5 days ;-)

Feel free to tell me if you need any help.

e-eight commented 3 years ago

Sorry @xuyxu. Been a little busy these last few days. Will try the fusion regression and get back to you soon. :)

xuyxu commented 3 years ago

Sorry @xuyxu. Been a little busy these last few days. Will try the fusion regression and get back to you soon. :)

Never mind, simply contribute to this PR on your own pace.

e-eight commented 3 years ago

Hi @xuyxu. Finally got some time to try it out. The fusion ensemble get much better results. In fact with 20 estimators the testing mean squared error was zero, up to two decimal points. Please let me know how you would like me to proceed from here.

xuyxu commented 3 years ago

Nice try @e-eight !

Sure, could you further check if there is any problem when modifying voting in the same way?

e-eight commented 3 years ago

@xuyxu I tried it for voting and bagging. No problems. The results improved, and are generally better than what one would get with the Adam optimizer. I will try the other methods soon.