Low performance of Supervised training

Expected behavior:

If there are no obvious error in "what you observed" provided above, please tell us the expected behavior.

Problem statement: We are using the pretrained imagenet model weights to perform supervised learning on our own dataset, consisting of ~60000 train images and ~14000 test images, there are a total of 1139 classes. I have changed the MLP head in the yaml file to reflect 1139 classes. Expected: Stable training What’s happening? Train accuracy increases too quickly reaching almost 90% in ~170 epochs but the test accuracy doesn’t improve at all, remai log (3).txt ning close to 0 for the most part. While performing supervised training in Pytorch we are able to get 70% accuracy. Any insights on why this might be happening? Suggestions to effectively utilize the VISSL pipelines will be appreciated.

Command:

python3 tools/run_distributed_engines.py hydra.verbose=true config=benchmark/fulltune/imagenet1k/train.yaml config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=16 config.DATA.TRAIN.DATA_PATHS=["/home/images/train"] config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] config.DATA.TEST.BATCHSIZE_PER_REPLICA=16 config.DATA.TEST.DATA_PATHS=["/home/images/test"] config.OPTIMIZER.num_epochs=250 config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001] config.OPTIMIZER.param_schedulers.lr.milestones=[1] config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true config.HOOKS.MEMORY_SUMMARY.PRINT_MEMORY_SUMMARY=false config.CHECKPOINT.DIR="/home/new_exp/checkpoint_supervised_2" config.MODEL.WEIGHTS_INIT.PARAMS_FILE="/home/resnet50-19c8e357.pth" config.MODEL.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=""

Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/vissl/raw/main/vissl/utils/collect_env.py && python collect_env.py

sys.platform linux Python 3.6.9 (default, Jun 29 2022, 11:45:57) [GCC 8.4.0] numpy 1.19.5 Pillow 8.4.0 vissl 0.1.6 @/home/vissl/vissl GPU available True GPU 0 Quadro GV100 CUDA_HOME /usr torchvision 0.9.0+cu101 @/home/.local/lib/python3.6/site-packages/torchvision hydra 1.0.7 @/home/.local/lib/python3.6/site-packages/hydra classy_vision 0.7.0.dev @/home/.local/lib/python3.6/site-packages/classy_vision tensorboard 2.9.1 apex 0.1 @/home/.local/lib/python3.6/site-packages/apex cv2 4.6.0 PyTorch 1.8.0+cu101 @/home/.local/lib/python3.6/site-packages/torch PyTorch debug build False

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

CPU info:

Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian CPU(s) 12 On-line CPU(s) list 0-11 Thread(s) per core 2 Core(s) per socket 6 Socket(s) 1 NUMA node(s) 1 Vendor ID GenuineIntel CPU family 6 Model 85 Model name Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz Stepping 4 CPU MHz 3999.959 CPU max MHz 4000.0000 CPU min MHz 1200.0000 BogoMIPS 6999.82 Virtualization VT-x L1d cache 32K L1i cache 32K L2 cache 1024K L3 cache 8448K NUMA node0 CPU(s) 0-11

facebookresearch / vissl