About Training Setting Parameters

Alibaba-MIIL / ImageNet21K

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper

MIT License

724 stars 71 forks source link

About Training Setting Parameters #35

Closed SJLeo closed 3 years ago

SJLeo commented 3 years ago

Could you provide the training parameters when using 8 * V100 with DDP? When I use the following command line python3 -u -m torch.distributed.launch --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_port 2221 train_semantic_softmax.py --data_path /data/imagenet22kp_fall/ --model_name mobilenetv3_large_100 --epochs 80 --weight_decay 1e-4 --batch_size 1024 --lr 3e-4 --num_classes 11221 --tree_path ./data/imagenet21k_miil_tree.pth --model_path=./mobilenetv3_large_100.pth The accuracy is only 71.366%, which is far lower than 73.1% reported in the paper.

mrT23 commented 3 years ago

following your params, make sure you are indeed using the fall11 version and not the winter21 version.

anyway, 73.1% reported in the article is with KD (it is clearly stated there) once you add KD from a stronger model, your accuracy will improve. (for tresnet_m, KD increased the semantic accuracy by more than 1%)

in addition, your batch size is larger than the one used in the article. try maybe a higher learning rate and see if it improve results.

SJLeo commented 3 years ago

Thank you very much

SJLeo commented 3 years ago

could you tell me what model to use as a teacher for distillation?

mrT23 commented 3 years ago

TResNet-L-V2 and ViT-B-16 are good candidates https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/MODEL_ZOO.md

i plan to publish in the future full kd training code. but you can definitely write one of your own. use the helper functions in: https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/src_files/semantic/semantics.py (line 104) as a template

SJLeo commented 3 years ago

nice