Hyperparameter setting for training from scratch on CIFAR-10

Yuancheng-Xu commented 1 year ago

Hi,

I am trying to train a convext on CIFAR-10 for a research project that doesn't allow using BN. I use the following configuration:

python -m torch.distributed.launch --nproc_per_node=4 main.py \
  --data_set image_folder --data_path ./CIFAR-10-images/train --eval_data_path ./CIFAR-10-images/test \
  --nb_classes 10 --num_workers 8 --warmup_epochs 0 \
  --save_ckpt false \
  --cutmix 0 --mixup 0 \
  --model_ema_eval true \
  --model convnext_tiny \
  --epochs 100 --lr 4e-4 --weight_decay 5e-2 --opt 'sgd' --input_size 32\
  --output_dir results/100epochs_lr_4e-4_wd_5e-2_sgd_inputsize_32 \

And the accuracy is only 75% percent (standard ResNet18 is about 93%). If I change the optimizer from AdamW to SGD, the best accuracy actually drops to below 50%. If I use the default input size 224, the accuracy is 84%, still significantly low.

Can ConvNeXt work on CIFAR10 without fine-tuning from a pretrained model? Could you provide a recommended set of hyper parameters for CIFAR10 (that should be robust to different types of optimizers and without mix-up and cutmix)?

Also I have another question on fine-tuning on CIFAR10: it seems that in the colab file the input_size is the default 224. However CIFAR10 image is 3232. Does this mean that in the data preparation stage the image will be padded to 224 224?

Thank you!

slerman12 commented 1 year ago

I was also wondering about this. It seems the 32x32 size of CIFAR-10 is incompatible with this model due to the down-sampling layers.

shamikbose commented 1 year ago

@Yuancheng-Xu It seems like it can. The downsampling layers should be set to a smaller kernel and stride size (2 and 2 respectively). Without this, the output of the downsampling layers is effectively the same size as the kernel. In addition, you might want to choose a smaller kernel and padding size for the Block convolutional layers Here's a notebook showing the training progress https://juliusruseckas.github.io/ml/convnext-cifar10.html

shamikbose commented 1 year ago

@Yuancheng-Xu I managed to get accuracy to 87% by making a few changes to the code in the link above. Basic changes are mentioned in this repository https://github.com/shamikbose/Fujitsu_Assessment Main changes were as follows:

The downsampling convolutional layers were modified (4x4 -> 2x2) for the smaller image size in the dataset
- This improved accuracy from 70% to 80%
Keeping CIFAR-10 training recipes in mind, the architecture was modified to be a 3-block architecture instead of a 4-block one
- This improved accuracy from 80% to 85%
Kernel size was changed (7 -> 3)
- This improved accuracy from 85% to 87%

Yuancheng-Xu commented 1 year ago

Thanks a lot!

iamsh4shank commented 1 year ago

Hey @shamikbose, I tried training the ImageNet100 dataset for custom input_size = 32, but the accuracy that I am getting is too low. What could I change in the architecture (I tried with making the kernel and stride small)? Any other approach that might help me to get good accuracy?

shamikbose commented 1 year ago

@iamsh4shank The parameters used for ImageNet100 are mentioned in the paper. You should be able to reproduce it using those values.

iamsh4shank commented 1 year ago

Actually ig it was for input_size 224 but on changing it to 32 I get accuracy really low

shamikbose commented 1 year ago

With image size 32, try the parameters mentioned here https://github.com/facebookresearch/ConvNeXt/issues/134#issuecomment-1534986992

iamsh4shank commented 1 year ago

I did try changing the Conv layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L28) with kernel size 3 and padding 1. Also, I changed the downsampling layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L74) with kernel size 2 and stride 2. It did not change the accuracy much. I am getting test accuracy like 4-5 percent

facebookresearch / ConvNeXt

Hyperparameter setting for training from scratch on CIFAR-10 #134