bdzyubak / torch-control

A top-level repo for evaluating natively available models
MIT License
2 stars 0 forks source link

Implement Image Classification on dermaMNIST #20

Closed bdzyubak closed 2 months ago

bdzyubak commented 3 months ago

As I have mostly done image segmentation in the past, let's do a classification project!

dermaMNIST is a publicly available dataset of small RGB images of skin tumors with multi-label disease classifications. Unlike the hand digit dataset which is easy, the benchmark accuracy published in Nature with resnet50 is only 0.73. https://www.nature.com/articles/s41597-022-01721-8/tables/4

Let's see if we can do better. pip install medmnist from medmnist import DermaMNIST image

bdzyubak commented 3 months ago

A basic tutorial style CNN + FCN gets to 0.69 train acc in one epoch and then stays there.

ResNet50 from torchvision gets: Fresh weights: [98/100] train_loss: 0.333 - train_acc: 0.884 - eval_loss: 1.127 - eval_acc: 0.704 Pretrained weights: [100/100] train_loss: 0.207 - train_acc: 0.932 - eval_loss: 1.095 - eval_acc: 0.754

Using pretrained than random weights as a starting point is much better for optimization and generalizability, as always. Another way to help generalizability would be to train just some of the layers (e.g. add additional heads).

bdzyubak commented 2 months ago

Some fun with basics D:\Source\torch-control\projects\ComputerVision\dermMNIST\train_basic_network.py

A) A network bottlenecked to 1x1 by CNN+maxpooling, that is torch.Size([100, 4096, 1, 1]) going into dense layers, will still train ok.

B) But a network that has another conv layer (no maxpool) following this won't. Likely because of issues with padding a 1x1 feature map for 3x3 Conv2d to run on it.

C) Maxpooling/increasing channels vs raw CNN layers helps optimization but has little impact on the final accuracy. The latter may not hold in a more difficult dataset. Maxpool: [22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

No maxpool: [22/100] train_loss: 0.510 - train_acc: 0.809 - eval_loss: 1.175 - eval_acc: 0.620 [52/100] train_loss: 0.012 - train_acc: 0.996 - eval_loss: 2.166 - eval_acc: 0.737

D) Without activation layers, the network will take longer to optimize. It still fits train data okay but fails to generalize, even with dropout: [22/100] train_loss: 0.609 - train_acc: 0.769 - eval_loss: 0.725 - eval_acc: 0.727 [70/100] train_loss: 0.143 - train_acc: 0.936 - eval_loss: 2.066 - eval_acc: 0.686

bdzyubak commented 2 months ago

The issue of the network underfitting this data was caused by a bug in the basic implementation. Pytorch CrossEntropyLoss applies nn.log_softmax() inside it and needs to be passed raw logits. If it is passed nn.log_softmax(), it works fine, but if it is passed nn.Softmax, this really hurts optimization With nn.Softmax() activation - stuck at 0.67 train acc: [22/100] train_loss: 1.499 - train_acc: 0.670 - eval_loss: 1.467 - eval_acc: 0.669

No nn.Softmax() layer - just pass logits [22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

In the end, fitting the train data in dermaMNIST turned out to be very easy. There is still a class imbalance and generalizability issue to val data, which will be addressed in a future issue.