Is it difficult to train/finetune ConvNeXtv2 compared with ConvNeXtv1?

facebookresearch / ConvNeXt-V2

Code release for ConvNeXt V2 model

Other

1.48k stars 117 forks source link

Is it difficult to train/finetune ConvNeXtv2 compared with ConvNeXtv1? #15

Open linhduongtuan opened 1 year ago

linhduongtuan commented 1 year ago

Dear authors, I have played around both ConvNeXt v1 and yours using TIMM codebase with my own datasets. Using V1 I don't struggle with training/finetuning for my datasets and am pleasure with my obtained overall performance for TIMM's variants. However, I can not achieve any comparative performance (overall accuracy as well as computed costs, of course) using your V2 variants with regarding every pretrained weights.

Can you give me any tip, trick, or treat for a set of your hyperparameters?

Thank in advance. Linh

shwoo93 commented 1 year ago

Thanks for noting this issue. One suggestion is to not weight decay the gamma/beta values in GRN during training (now updated as default behavior).

linhduongtuan commented 1 year ago

Thank for your explanation. I will try these models again.

linhduongtuan commented 1 year ago

I have been trying to train model ConvNeXt-V2-Tiny again following your new setup for the optimization. However, my obtained results, which don't not improve an overall accuracy as well as need much GPU memory comparing with V1, are still much lower than that of using ConvNeXt-Tiny. Can you double check the optimization recipe using CIFAR, MNIST, ect., for instance? Linh

Metal079 commented 1 year ago

Can confirm it's difficult to fine-tune. ConvNextV1-base gets me 86%-88% on my dataset within 5 epochs while ConvNextV2-Base can't seem to get over 81% no matter how I tweak the hyperparameters.

hbellafkir commented 6 months ago

any updates on this issue? I'm having the same problem

blackpearl1022 commented 2 months ago

@Metal079 Any updates ? I have same issues on my side.

Metal079 commented 2 months ago

@Metal079 Any updates ? I have same issues on my side.