:bug:Training with WideResNetBaseline

ENSTA-U2IS-AI / torch-uncertainty

Open-source framework for uncertainty and deep learning models in PyTorch :seedling:

https://torch-uncertainty.github.io

Apache License 2.0

292 stars 20 forks source link

:bug:Training with WideResNetBaseline #113

Closed Autochthonal closed 2 weeks ago

Autochthonal commented 2 weeks ago

hey!

First of all, I really want to appreciate for creating this amazing library!

When I try to apply the WideResNetBaseline for image classification task, the testing accuracy is really low. This circumstance appears in different datasets including MNIST and CIFAR. The codes and results are provided below:

from pathlib import Path
from torch_uncertainty.utils import TUTrainer
from torch_uncertainty.datamodules import MNISTDataModule, CIFAR10DataModule
from torch_uncertainty.baselines.classification import WideResNetBaseline
import torch

trainer = TUTrainer(accelerator="cuda", max_epochs=10, enable_progress_bar=True)

root = Path("/home/torch_uncertainty/data")
datamodule = MNISTDataModule(root=root, batch_size=128, eval_ood=False)
loss_fn = torch.nn.CrossEntropyLoss()
model = WideResNetBaseline(datamodule.num_classes,
                           datamodule.num_channels,
                           loss_fn,
                           style="cifar",
                           version="std")

trainer.fit(model=model, datamodule=datamodule)
trainer.test(model=model, datamodule=datamodule)

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       test/cal/ECE        │          0.02958          │
│       test/cal/aECE       │          0.03167          │
│       test/cls/Acc        │          0.10580          │
│      test/cls/Brier       │          0.90634          │
│       test/cls/NLL        │          2.33843          │
│     test/cls/entropy      │          2.27538          │
│       test/sc/AUGRC       │          0.44700          │
│       test/sc/AURC        │          0.89636          │
│    test/sc/CovAt5Risk     │            nan            │
│    test/sc/RiskAt80Cov    │          0.89113          │
└───────────────────────────┴───────────────────────────┘

Since the Tutorials you provided in the documents does not cover the usage of "torch_uncertainty.baselines" yet, I wonder if I have made some mistakes in my codes?

Besides, I think it will be better to monitor the training process if there exists parameter in trainer.fit() for metrics presentation.

Looking forward to your guidance and thanks again!

o-laurent commented 2 weeks ago

Hey @Autochthonal,

Thank you for your message and the great question. It seems that this problem appeared on our side when we switched to the Lightning CLI. We wanted to keep the possibility of instantiating routines and baselines (which are routines linked to a specific model) without the CLI, but we forgot to keep an optional optim_recipe parameter in the baselines. In your code, there is no optimizer whatsoever, so the weights stay the same, and there's no training.

Here is a "simple" workaround, (I've put the number of epochs to 1) - just setting manually the value of the optim_recipe:

from pathlib import Path
from torch_uncertainty.utils import TUTrainer
from torch_uncertainty.datamodules import MNISTDataModule
from torch_uncertainty.baselines.classification import WideResNetBaseline
import torch
from torch.optim import SGD

trainer = TUTrainer(accelerator="cuda", max_epochs=1, enable_progress_bar=True)

root = Path("/home/torch_uncertainty/data")
datamodule = MNISTDataModule(root=root, batch_size=128, eval_ood=False)
loss_fn = torch.nn.CrossEntropyLoss()
model = WideResNetBaseline(datamodule.num_classes,
                           datamodule.num_channels,
                           loss_fn,
                           style="cifar",
                           version="std")
model.optim_recipe = {"optimizer": SGD(model.model.parameters(), lr=1e-2), "scheduler": None}
trainer.fit(model=model, datamodule=datamodule)
trainer.test(model=model, datamodule=datamodule)

I got 97.6% accuracy (not tuned, of course).

Besides, I think it will be better to monitor the training process if there exists parameter in trainer.fit() for metrics presentation.

Can you describe your idea a bit more? On the metrics side, I've already implemented printing the loss at each step in the progress bar, as well as some important metrics at validation, which I'll push on dev soon and we advise using tensorboard which is natively supported by TU.

Again, thanks for your message! We will fix this problem in the next release (or push on the main branch if you install TU from source).

Autochthonal commented 2 weeks ago

Hi there! I have applied your adjustment and it do works out on my projects. I have tried in different settings to ensure its consistency.

As the suggestion on the trainer.fit(), I mean exactly the presentation of loss&accuracy calculated in the evaluation process. Hoping to see your newest release!

Moreover, I was trying to realize the BNN-based-WideResNet in the project, and it seems like I need to reformulate the wideresnet28x10() function in torch_uncertainty/models/wideresnet/std.py following the realization in torch_uncertainty/models/lenet.py. And the project codes need to be adjusted as the realization in Tutorials Train a Bayesian Neural Network in Three Minutes .

I wonder if there exists a easier solution comparing with the above one? I mean, it will be better if we can use the Baseline framework for the BNN realization?

Really really thanks!

o-laurent commented 2 weeks ago

Hi again, @Autochthonal!

First, let me warn you that training WideResNet BNNs may take a lot of work (and maybe some magic...)! If you want to do it, you have to re-implement the wide-ResNet models for now, as their structure is not modular enough (compared to ResNets!). I've been a bit lazy about this, to be honest. If you would like more modular WideResNets, I can implement them in the following days.

As the suggestion on the trainer.fit(), I mean exactly the presentation of loss&accuracy calculated in the evaluation process. Hoping to see your newest release!

Is it that you dislike the table? I still struggle to thoroughly comprehend your advice, although I am very interested in improving the library on this side.

Feel free to re-open this issue or open another if you feel my answer is not yet entirely satisfactory!

Autochthonal commented 2 weeks ago

Forgive me on my poor expressions... Actually I can get your implements on printing the loss at each step in the progress bar, as well as some important metrics at validation, and I think any representation methods including tensorboard are cool! I am looking forward for the latest version!

As for the BNN-based-WideResNet, what I want to do is to compare the experiment results provided in the TPAMI paper Encoding the Latent Posterior of Bayesian Neural Networks for Uncertainty Quantification, where the researchers utilized the WRN-28-10 architecture for the CIFAR10/100 dataset classificaition.

And you have mentioned you have to re-implement the wide-ResNet models for now, as their structure is not modular enough (compared to ResNets!), however I only find the parameter option of "lpbnn" in the ResNetBaseline class in torch_uncertainty/baselines/classification/resnet.py, so does that means I need to adapt it for the original BNN realization (maybe attached the Variational Inference method for BNN optimization) ? Or there exists a easier way to use the ResNet baseline for BNN-based-ResNet (as a matter of fact, I think both wideResNet and ResNet are acceptable for experiments so you don't need to hurry! hahaha)

Thank you * 3

o-laurent commented 2 weeks ago

No worries, it isn't very easy to communicate on GitHub for everyone :sweat_smile:

I'm happy if what I've pushed (now deployed in 0.2.2.post0 thx to @alafage) is already an improvement on this side for you!

So when you mean BNN, you mean LP-BNN, right? In this case, I have indeed only implemented LP-BNN for ResNets. Yet, just changing the layers as in torch_uncertainty/models/resnet/lpbnn.py and taking the correct loss should be sufficient. Unfortunately, the LP-BNN implementation has not been thoroughly tested. It's a cleaned version of the original repository, but I can't promise that it will provide very similar results, although I've done my best to avoid mistakes.

You're welcome! Let me know if you face any issues on that side.

Autochthonal commented 2 weeks ago

Hi there! As you say, I think it is practical for me to implement the original BNN (not LP-BNN) on ResNet following the structure you provided in torch_uncertainty/models/resnet/lpbnn.py, and I will try to realize it these days. A new issue will be opened if I encounter some confusions, and I want to express my appreciation to you guys again!

Autochthonal commented 1 week ago

Hey there, sorry to bother you guys again. Actually I was working on the semantic segmentation task these days with the SegmentationRoutine. However the test results such as mIoU are quite low even though I add the optim_recipe to the SegmentationRoutine. The main function code and test results are listed in below, and would you please check them for any potential reasons? @o-laurent

from pathlib import Path
from torch import nn, optim
from torch_uncertainty.utils import TUTrainer
import torch
from torch.optim import SGD
from torch_uncertainty.datamodules.segmentation import CamVidDataModule
from torch_uncertainty.models.segmentation.deeplab import deep_lab_v3_resnet50
from torch_uncertainty.routines.segmentation import SegmentationRoutine

def optim(model: nn.Module):
    optimizer = SGD(
        model.parameters(),
        lr=1e-3,
    )
    return optimizer

trainer = TUTrainer(max_epochs=100, enable_progress_bar=True)

# datamodule
root = Path("/home/torch_uncertainty/data")
datamodule = CamVidDataModule(root=root, batch_size=128)
# model
model = deep_lab_v3_resnet50(num_classes=32, style="v3+")

loss = torch.nn.CrossEntropyLoss()
routine = SegmentationRoutine(
    model=model,
    num_classes=32,
    loss=loss,
    optim_recipe=optim(model)
)

trainer.fit(model=routine, datamodule=datamodule)
trainer.test(model=routine, datamodule=datamodule)

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       test/cal/ECE        │          0.08996          │
│       test/cal/aECE       │          0.08924          │
│       test/sc/AUGRC       │          0.03775          │
│       test/sc/AURC        │          0.05829          │
│      test/seg/Brier       │          0.25242          │
│       test/seg/NLL        │          0.62267          │
│       test/seg/mAcc       │          0.52180          │
│       test/seg/mIoU       │          0.13649          │
│      test/seg/pixAcc      │          0.84269          │
└───────────────────────────┴───────────────────────────┘

I have tried the SGD optimizer and Adam optimizer, and their training results are similar.

alafage commented 1 week ago

Hi @Autochthonal,

I have looked at your issue. For the CamVid dataset, the number of classes is 12, not 32. I think that's why your mIoU is that low. The other metrics are not affected by it.

Here's what I get with batch_size=16:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       test/cal/ECE        │          0.05449          │
│       test/cal/aECE       │          0.05499          │
│       test/sc/AUGRC       │          0.05439          │
│       test/sc/AURC        │          0.07545          │
│      test/seg/Brier       │          0.31001          │
│       test/seg/NLL        │          0.67216          │
│       test/seg/mAcc       │          0.37725          │
│       test/seg/mIoU       │          0.26113          │
│      test/seg/pixAcc      │          0.78461          │
└───────────────────────────┴───────────────────────────┘

I am not sure what should be the performance of a DeepLabv3 model on CamVid with your optimization procedure but increasing the learning rate as shown in the following code:

from pathlib import Path

import torch
from torch import nn
from torch.optim import SGD

from torch_uncertainty.datamodules.segmentation import CamVidDataModule
from torch_uncertainty.models.segmentation.deeplab import deep_lab_v3_resnet50
from torch_uncertainty.routines.segmentation import SegmentationRoutine
from torch_uncertainty.utils import TUTrainer

def optim_recipe(model: nn.Module):
    return SGD(
          model.parameters(),
          lr=1e-2,
      )

trainer = TUTrainer(max_epochs=100, enable_progress_bar=True)

# datamodule
root = Path("./data")
datamodule = CamVidDataModule(root=root, batch_size=16)
# model
model = deep_lab_v3_resnet50(num_classes=12, style="v3+")

loss = torch.nn.CrossEntropyLoss()
routine = SegmentationRoutine(
    model=model,
    num_classes=12,
    loss=loss,
    optim_recipe=optim_recipe(model)
)

trainer.fit(model=routine, datamodule=datamodule)
trainer.test(model=routine, datamodule=datamodule)

I get better results:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│       test/cal/ECE        │          0.03045          │
│       test/cal/aECE       │          0.02972          │
│       test/sc/AUGRC       │          0.03739          │
│       test/sc/AURC        │          0.04818          │
│      test/seg/Brier       │          0.24977          │
│       test/seg/NLL        │          0.50273          │
│       test/seg/mAcc       │          0.51431          │
│       test/seg/mIoU       │          0.36155          │
│      test/seg/pixAcc      │          0.82736          │
└───────────────────────────┴───────────────────────────┘

Finding better hyperparameters might be necessary to achieve higher results, but I don't think there is something wrong with the SegmentationRoutine.

I hope it helps!

Autochthonal commented 1 week ago

hey there @alafage， Thanks for your correction and it can solve me problem. As for the segmentation results, I have tried different architectures and hyperparameters these days, unfortunately the results show that they fall to achieve a significant improvements on mIoU. After checking the camvid.py, I found out that the data pre-processing does not contain any data augmentations, and I wonder whether this factor restricts the final prediction performance?

o-laurent commented 1 week ago

Hi @Autochthonal,

Unfortunately, we had implemented CamVid following a notebook that we had in class, which was, in many regards, incorrect or imprecise. With @alafage, we have updated the dataset and datamodule to reflect the literature better. Indeed, researchers often regroup classes of CamVid into 11 superclasses. This is now the default behavior of our CamVid's implementation.

I've also added a better configuration file for deeplabv3+. I have 65% mIoU after 32 epochs (out of 120). I tried to follow the implementation of this paper. We now have the same data augmentations (except Gaussian blur) and optimization recipe. You are right to expect that augmentations would help due to CamVid's size. We don't have CityScapes pre-training yet. Again, this could help a lot because of the dataset's small size. We could work on it with you, if you want. Another idea to improve the mIoU would be to add weights to the different classes to fix the poorer perfs on less represented classes.

You will find all these changes in the last push on dev. And many thanks for your help in finding bugs and inconsistencies!!