facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

Bug in the architecture of ResNet-50-w2 #469

Open CharlieCheckpt opened 2 years ago

CharlieCheckpt commented 2 years ago

Hi vissl team ! Thank you for the great package.

I got a dimension error when running the example from the documentation to train MoCo with ResNet-50-w2 (2x wider ResNet-50).

This error seems to be due to a bug in the architecture of ResNet-50-w2. Indeed I compared it with the architecture of torchvision.models.wide_resnet50_2 and the architectures are different.

Looking at 3. below, one can see that in vissl, first layer of ResNet-50-w2. is : (conv1): Conv2d(3, 128, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

In torchvision:

from torchvision.models import wide_resnet50_2
wide_resnet50_2(pretrained=False)

prints :

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) ...

Instructions To Reproduce the 🐛 Bug:

  1. what changes you made (git diff) or what code you wrote None

  2. what exact command you run: I ran the command proposed in the documentation.

python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
    config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.RESNETS.DEPTH=50 \
    config.MODEL.TRUNK.RESNETS.WIDTH_MULTIPLIER=2
  1. what you observed (including full logs):
** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

####### overrides: ['hydra.verbose=True', 'config=moco_1node_resnet.yaml', 'config.MODEL.TRUNK.RESNETS.WIDTH_MULTIPLIER=2']
INFO 2021-11-13 18:02:39,774 distributed_launcher.py: 183: Spawning process for node_id: 0, local_rank: 0, dist_rank: 0, dist_run_id: localhost:60803
INFO 2021-11-13 18:02:39,774 train.py:  94: Env set for rank: 0, dist_rank: 0
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_DEFAULT_ENV:    vissl
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_EXE:    /opt/anaconda/bin/conda
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_PREFIX: /home/xxx/.conda/envs/vissl
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_PREFIX_1:       /opt/anaconda
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_PROMPT_MODIFIER:        (vissl)
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_PYTHON_EXE:     /opt/anaconda/bin/python
INFO 2021-11-13 18:02:39,775 env.py:  50: CONDA_SHLVL:  2
INFO 2021-11-13 18:02:39,775 env.py:  50: HOGRPATH:     /DATA/data/hogrc:/STORAGE/data/hogrc
INFO 2021-11-13 18:02:39,775 env.py:  50: HOME: /home/xxx
INFO 2021-11-13 18:02:39,775 env.py:  50: LANG: en_US.UTF-8
INFO 2021-11-13 18:02:39,775 env.py:  50: LC_TERMINAL:  iTerm2
INFO 2021-11-13 18:02:39,775 env.py:  50: LC_TERMINAL_VERSION:  3.4.10
INFO 2021-11-13 18:02:39,775 env.py:  50: LD_LIBRARY_PATH:      /usr/local/cuda-10.0/lib64:/usr/local/lib:/usr/local/cuda-10.0/lib64:/usr/local/lib:
INFO 2021-11-13 18:02:39,775 env.py:  50: LESSCLOSE:    /usr/bin/lesspipe %s %s
INFO 2021-11-13 18:02:39,775 env.py:  50: LESSOPEN:     | /usr/bin/lesspipe %s
INFO 2021-11-13 18:02:39,775 env.py:  50: LOCAL_RANK:   0
INFO 2021-11-13 18:02:39,775 env.py:  50: LOGNAME:      xxx
INFO 2021-11-13 18:02:39,775 env.py:  50: LS_COLORS:    rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
INFO 2021-11-13 18:02:39,775 env.py:  50: MAIL: /var/mail/xxx
INFO 2021-11-13 18:02:39,775 env.py:  50: OLDPWD:       /home/xxx/workspace/ssl
INFO 2021-11-13 18:02:39,775 env.py:  50: PATH: /home/xxx/.local/bin:/usr/local/cuda-10.0/bin:/home/xxx/.conda/envs/vissl/bin:/home/xxx/workspace/ssl/venv/bin:/opt/anaconda/bin:/opt/anaconda/condabin:/usr/local/cuda-10.0/bin:/opt/anaconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
INFO 2021-11-13 18:02:39,776 env.py:  50: PS1:  (vissl) ${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$
INFO 2021-11-13 18:02:39,776 env.py:  50: PWD:  /home/c
INFO 2021-11-13 18:02:39,776 env.py:  50: WORLD_SIZE:   1
INFO 2021-11-13 18:02:39,776 env.py:  50: XDG_DATA_DIRS:        /usr/local/share:/usr/share:/var/lib/snapd/desktop
INFO 2021-11-13 18:02:39,776 env.py:  50: XDG_RUNTIME_DIR:      /run/user/10030
INFO 2021-11-13 18:02:39,776 env.py:  50: XDG_SESSION_ID:       15823
INFO 2021-11-13 18:02:39,776 env.py:  50: _:    /home/xxx/.conda/envs/vissl/bin/python
INFO 2021-11-13 18:02:39,776 misc.py: 161: Set start method of multiprocessing to forkserver
INFO 2021-11-13 18:02:39,776 train.py: 105: Setting seed....
INFO 2021-11-13 18:02:39,776 misc.py: 173: MACHINE SEED: 0
INFO 2021-11-13 18:02:39,780 hydra_config.py: 132: Training with config:
INFO 2021-11-13 18:02:39,786 hydra_config.py: 141: {'CHECKPOINT': {'APPEND_DISTR_RUN_ID': False,
                'AUTO_RESUME': True,
                'BACKEND': 'disk',
                'CHECKPOINT_FREQUENCY': 5,
                'CHECKPOINT_ITER_FREQUENCY': 100,
                'DIR': '.',
                'LATEST_CHECKPOINT_RESUME_FILE_NUM': 1,
                'OVERWRITE_EXISTING': False,
                'USE_SYMLINK_CHECKPOINT_FOR_RESUME': False},
 'CLUSTERFIT': {'CLUSTER_BACKEND': 'faiss',
                'DATA_LIMIT': -1,
                'DATA_LIMIT_SAMPLING': {'SEED': 0},
                'FEATURES': {'DATASET_NAME': '',
                             'DATA_PARTITION': 'TRAIN',
                             'DIMENSIONALITY_REDUCTION': 0,
                             'EXTRACT': False,
                             'LAYER_NAME': '',
                             'PATH': '.',
                             'TEST_PARTITION': 'TEST'},
                'NUM_CLUSTERS': 16000,
                'NUM_ITER': 50,
                'OUTPUT_DIR': '.'},
 'DATA': {'DDP_BUCKET_CAP_MB': 25,
          'ENABLE_ASYNC_GPU_COPY': True,
          'NUM_DATALOADER_WORKERS': 5,
          'PIN_MEMORY': True,
          'TEST': {'BASE_DATASET': 'generic_ssl',
                   'BATCHSIZE_PER_REPLICA': 256,
                   'COLLATE_FUNCTION': 'default_collate',
                   'COLLATE_FUNCTION_PARAMS': {},
                   'COPY_DESTINATION_DIR': '',
                   'COPY_TO_LOCAL_DISK': False,
                   'DATASET_NAMES': ['imagenet1k_folder'],
                   'DATA_LIMIT': -1,
                   'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False,
                                           'SEED': 0,
                                           'SKIP_NUM_SAMPLES': 0},
                   'DATA_PATHS': [],
                   'DATA_SOURCES': [],
                   'DEFAULT_GRAY_IMG_SIZE': 224,
                   'DROP_LAST': False,
                   'ENABLE_QUEUE_DATASET': False,
                   'INPUT_KEY_NAMES': ['data'],
                   'LABEL_PATHS': [],
                   'LABEL_SOURCES': [],
                   'LABEL_TYPE': 'sample_index',
                   'MMAP_MODE': True,
                   'NEW_IMG_PATH_PREFIX': '',
                   'RANDOM_SYNTHETIC_IMAGES': False,
                   'REMOVE_IMG_PATH_PREFIX': '',
                   'TARGET_KEY_NAMES': ['label'],
                   'TRANSFORMS': [],
                   'USE_DEBUGGING_SAMPLER': False,
                   'USE_STATEFU
                    'DROP_LAST': True,
                    'ENABLE_QUEUE_DATASET': False,
                    'INPUT_KEY_NAMES': ['data'],
                    'LABEL_PATHS': [],
                    'LABEL_SOURCES': [],
                    'LABEL_TYPE': 'sample_index',
                    'MMAP_MODE': True,
                    'NEW_IMG_PATH_PREFIX': '',
                    'RANDOM_SYNTHETIC_IMAGES': False,
                    'REMOVE_IMG_PATH_PREFIX': '',
                    'TARGET_KEY_NAMES': ['label'],
                    'TRANSFORMS': [{'name': 'ImgReplicatePil', 'num_times': 2},
                                   {'name': 'RandomResizedCrop', 'size': 224},
                                   {'name': 'ImgPilColorDistortion',
                                    'strength': 0.5},
                                   {'name': 'ImgPilGaussianBlur',
                                    'p': 0.5,
                                    'radius_max': 2.0,
                                    'radius_min': 0.1},
                                   {'name': 'RandomHorizontalFlip', 'p': 0.5},
                                   {'name': 'ToTensor'},
                                   {'mean': [0.485, 0.456, 0.406],
                                    'name': 'Normalize',
                                    'std': [0.229, 0.224, 0.225]}],
                    'USE_DEBUGGING_SAMPLER': False,
                    'USE_STATEFUL_DISTRIBUTED_SAMPLER': False}},
 'DISTRIBUTED': {'BACKEND': 'nccl',
                 'BROADCAST_BUFFERS': True,
                 'INIT_METHOD': 'tcp',
                 'MANUAL_GRADIENT_REDUCTION': False,
                 'NCCL_DEBUG': False,
                 'NCCL_SOCKET_NTHREADS': '',
                 'NUM_NODES': 1,
                 'NUM_PROC_PER_NODE': 1,
                 'RUN_ID': 'auto'},
 'EXTRACT_FEATURES': {'CHUNK_THRESHOLD': 0, 'OUTPUT_DIR': ''},
 'HOOKS': {'CHECK_NAN': True,
           'LOG_GPU_STATS': True,
           'MEMORY_SUMMARY': {'DUMP_MEMORY_ON_EXCEPTION': False,
                              'LOG_ITERATION_NUM': 0,
                              'PRINT_MEMORY_SUMMARY': True},
           'MODEL_COMPLEXITY': {'COMPUTE_COMPLEXITY': False,
                                'INPUT_SHAPE': [3, 224, 224]},
           'PERF_STATS': {'MONITOR_PERF_STATS': True,
                          'PERF_STAT_FREQUENCY': -1,
                          'ROLLING_BTIME_FREQ': 313},
           'TENSORBOARD_SETUP': {'EXPERIMENT_LOG_DIR': 'moco_v2_reference',
                                 'FLUSH_EVERY_N_MIN': 20,
                                 'LOG_DIR': '.',
                                 'LOG_PARAMS': False,
                                 'LOG_PARAMS_EVERY_N_ITERS': 310,
                                 'LOG_PARAMS_GRADIENTS': True,
                                 'USE_TENSORBOARD': True}},
 'IMG_RETRIEVAL': {'CROP_QUERY_ROI': False,
                   'DATASET_PATH': '',
                   'DEBUG_MODE': False,
                   'EVAL_BINARY_PATH': '',
                   'EVAL_DATASET_NAME': 'Paris',
                   'FEATS_PROCESSING_TYPE': '',
                   'GEM_POOL_POWER': 4.0,
                   'IMG_SCALINGS': [1],
                   'NORMALIZE_FEATURES': True,

                   'SAVE_RETRIEVAL_RANKINGS_SCORES': True,
                   'SIMILARITY_MEASURE': 'cosine_similarity',
                   'SPATIAL_LEVELS': 3,
                   'TRAIN_DATASET_NAME': 'Oxford',
                   'TRAIN_PCA_WHITENING': True,
                   'USE_DISTRACTORS': False,
                   'WHITEN_IMG_LIST': ''},
 'LOG_FREQUENCY': 200,
 'LOSS': {'CrossEntropyLoss': {'ignore_index': -1},
          'barlow_twins_loss': {'embedding_dim': 8192,
                                'lambda_': 0.0051,
                                'scale_loss': 0.024},
          'bce_logits_multiple_output_single_target': {'normalize_output': False,
                                                       'reduction': 'none',
                                                       'world_size': 1},
          'cross_entropy_multiple_output_single_target': {'ignore_index': -1,
                                                          'normalize_output': False,
                                                          'reduction': 'mean',
                                                          'temperature': 1.0,
                                                          'weight': None},
          'deepclusterv2_loss': {'BATCHSIZE_PER_REPLICA': 256,
                                 'DROP_LAST': True,
                                 'kmeans_iters': 10,
                                 'memory_params': {'crops_for_mb': [0],
                                                   'embedding_dim': 128},
                                 'num_clusters': [3000, 3000, 3000],
                                 'num_crops': 2,
                                 'num_train_samples': -1,
                                 'temperature': 0.1},
          'dino_loss': {'crops_for_teacher': [0, 1],
                        'ema_center': 0.9,
                        'momentum': 0.996,
                        'normalize_last_layer': True,
                        'output_dim': 65536,
                        'student_temp': 0.1,
                        'teacher_temp_max': 0.07,
                        'teacher_temp_min': 0.04,
                        'teacher_temp_warmup_iters': 37500},
          'moco_loss': {'embedding_dim': 128,
                        'momentum': 0.999,
                        'queue_size': 65536,
                        'temperature': 0.2},
          'multicrop_simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096,
                                                               'embedding_dim': 128,
                                                               'world_size': 64},
                                             'num_crops': 2,
                                             'temperature': 0.1},
          'name': 'moco_loss',
          'nce_loss_with_memory': {'loss_type': 'nce',
                                   'loss_weights': [1.0],
                                   'memory_params': {'embedding_dim': 128,
                                                     'memory_size': -1,
                                                     'momentum': 0.5,
                                                     'norm_init': True,
                                                     'update_mem_on_forward': True},
                                   'negative_sampling_params': {'num_negatives': 16000,
                                                                'type': 'random'},
                                   'norm_constant': -1,
                                   'norm_embedding': True,
                                   'num_train_samples': -1,
                                   'temperature': 0.07,
                                   'update_mem_with_emb_index': -100},
          'simclr_info
                        'num_prototypes': [3000],
                        'output_dir': '.',
                        'queue': {'local_queue_length': 0,
                                  'queue_length': 0,
                                  'start_iter': 0},
                        'temp_hard_assignment_iters': 0,
                        'temperature': 0.1,
                        'use_double_precision': False},
          'swav_momentum_loss': {'crops_for_assign': [0, 1],
                                 'embedding_dim': 128,
                                 'epsilon': 0.05,
                                 'momentum': 0.99,
                                 'momentum_eval_mode_iter_start': 0,
                                 'normalize_last_layer': True,
                                 'num_crops': 2,
                                 'num_iters': 3,
                                 'num_prototypes': [3000],
                                 'queue': {'local_queue_length': 0,
                                           'queue_length': 0,
                                           'start_iter': 0},
                                 'temperature': 0.1,
                                 'use_double_precision': False}},
 'MACHINE': {'DEVICE': 'gpu'},
 'METERS': {'accuracy_list_meter': {'meter_names': [],
                                    'num_meters': 1,
                                    'topk_values': [1]},
            'enable_training_meter': True,
            'mean_ap_list_meter': {'max_cpu_capacity': -1,
                                   'meter_names': [],
                                   'num_classes': 9605,
                                   'num_meters': 1},
            'model_output_mask': False,
            'name': '',
            'names': [],
            'precision_at_k_list_meter': {'meter_names': [],
                                          'num_meters': 1,
                                          'topk_values': [1]},
            'recall_at_k_list_meter': {'meter_names': [],
                                       'num_meters': 1,
                                       'topk_values': [1]}},
 'MODEL': {'ACTIVATION_CHECKPOINTING': {'NUM_ACTIVATION_CHECKPOINTING_SPLITS': 2,
                                        'USE_ACTIVATION_CHECKPOINTING': False},
           'AMP_PARAMS': {'AMP_ARGS': {'opt_level': 'O1'},
                          'AMP_TYPE': 'apex',
                          'USE_AMP': False},
           'BASE_MODEL_NAME': 'multi_input_output_model',
           'CUDA_CACHE': {'CLEAR_CUDA_CACHE': False, 'CLEAR_FREQ': 100},
           'FEATURE_EVAL_SETTINGS': {'EVAL_MODE_ON': False,
                                     'EVAL_TRUNK_AND_HEAD': False,
                                     'EXTRACT_TRUNK_FEATURES_ONLY': False,
                                     'FREEZE_TRUNK_AND_HEAD': False,
                                     'FREEZE_TRUNK_ONLY': False,
                                     'LINEAR_EVAL_FEAT_POOL_OPS_MAP': [],
                                     'SHOULD_FLATTEN_FEATS': True},
           'FSDP_CONFIG': {'AUTO_WRAP_THRESHOLD': 0,
                           'bucket_cap_mb': 0,
                           'clear_autocast_cache': True,
                           'compute_dtype': torch.float32,
                           'flatten_parameters': True,
                           'fp32_reduce_scatter': False,
                           'mixed_precision': True,
                           'verbose': True},

           'NON_TRAINABLE_PARAMS': [],
           'SHARDED_DDP_SETUP': {'USE_SDP': False, 'reduce_buffer_size': -1},
           'SINGLE_PASS_EVERY_CROP': False,
           'SYNC_BN_CONFIG': {'CONVERT_BN_TO_SYNC_BN': False,
                              'GROUP_SIZE': -1,
                              'SYNC_BN_TYPE': 'pytorch'},
           'TEMP_FROZEN_PARAMS_ITER_MAP': [],
           'TRUNK': {'CONVIT': {'CLASS_TOKEN_IN_LOCAL_LAYERS': False,
                                'LOCALITY_DIM': 10,
                                'LOCALITY_STRENGTH': 1.0,
                                'N_GPSA_LAYERS': 10,
                                'USE_LOCAL_INIT': True},
                     'EFFICIENT_NETS': {},
                     'NAME': 'resnet',
                     'REGNET': {},
                     'RESNETS': {'DEPTH': 50,
                                 'GROUPNORM_GROUPS': 32,
                                 'GROUPS': 1,
                                 'LAYER4_STRIDE': 2,
                                 'NORM': 'BatchNorm',
                                 'STANDARDIZE_CONVOLUTIONS': False,
                                 'WIDTH_MULTIPLIER': 2,
                                 'WIDTH_PER_GROUP': 64,
                                 'ZERO_INIT_RESIDUAL': True},
                     'VISION_TRANSFORMERS': {'ATTENTION_DROPOUT_RATE': 0,
                                             'CLASSIFIER': 'token',
                                             'DROPOUT_RATE': 0,
                                             'DROP_PATH_RATE': 0,
                                             'HIDDEN_DIM': 768,
                                             'IMAGE_SIZE': 224,
                                             'MLP_DIM': 3072,
                                             'NUM_HEADS': 12,
                                             'NUM_LAYERS': 12,
                                             'PATCH_SIZE': 16,
                                             'QKV_BIAS': False,
                                             'QK_SCALE': False,
                                             'name': None},
                     'XCIT': {'ATTENTION_DROPOUT_RATE': 0,
                              'DROPOUT_RATE': 0,
                              'DROP_PATH_RATE': 0.05,
                              'ETA': 1,
                              'HIDDEN_DIM': 384,
                              'IMAGE_SIZE': 224,
                              'NUM_HEADS': 8,
                              'NUM_LAYERS': 12,
                              'PATCH_SIZE': 16,
                              'QKV_BIAS': True,
                              'QK_SCALE': False,
                              'TOKENS_NORM': True,
                              'name': None}},
           'WEIGHTS_INIT': {'APPEND_PREFIX': '',
                            'PARAMS_FILE': '',
                            'REMOVE_PREFIX': '',
                            'SKIP_LAYERS': ['num_batches_tracked'],
                            'STATE_DICT_KEY_NAME': 'classy_state_dict'},
           '_MODEL_INIT_SEED': 0},
 'MONITORING': {'MONITOR_ACTIVATION_STATISTICS': 0},
 'MULTI_PROCESSING_METHOD': 'forkserver',
 'NEAREST_NEIGHBOR': {'L2_NORM_FEATS': False, 'SIGMA': 0.1, 'TOPK': 200},
 'OPTIMIZER': {'betas': [0.9, 0.999],
               'construct_single_param_group_only': False,
               'head_optimizer_params': {'use_different_lr': False,
                                         'u
                                           'interval_scaling': [],
                                           'lengths': [],
                                           'milestones': [120, 160],
                                           'name': 'multistep',
                                           'schedulers': [],
                                           'start_value': 0.1,
                                           'update_interval': 'epoch',
                                           'value': 0.1,
                                           'values': [0.03, 0.003, 0.0003]},
                                    'lr_head': {'auto_lr_scaling': {'auto_scale': False,
                                                                    'base_lr_batch_size': 256,
                                                                    'base_value': 0.1,
                                                                    'scaling_type': 'linear'},
                                                'end_value': 0.0,
                                                'interval_scaling': [],
                                                'lengths': [],
                                                'milestones': [120, 160],
                                                'name': 'multistep',
                                                'schedulers': [],
                                                'start_value': 0.1,
                                                'update_interval': 'epoch',
                                                'value': 0.1,
                                                'values': [0.03,
                                                           0.003,
                                                           0.0003]}},
               'regularize_bias': True,
               'regularize_bn': True,
               'use_larc': False,
               'use_zero': False,
               'weight_decay': 0.0001},
 'PROFILING': {'MEMORY_PROFILING': {'TRACK_BY_LAYER_MEMORY': False},
               'NUM_ITERATIONS': 10,
               'OUTPUT_FOLDER': '.',
               'PROFILED_RANKS': [0, 1],
               'RUNTIME_PROFILING': {'LEGACY_PROFILER': False,
                                     'PROFILE_CPU': True,
                                     'PROFILE_GPU': True,
                                     'USE_PROFILER': False},
               'START_ITERATION': 0,
               'STOP_TRAINING_AFTER_PROFILING': False,
               'WARMUP_ITERATIONS': 0},
 'REPRODUCIBILITY': {'CUDDN_DETERMINISTIC': False},
 'SEED_VALUE': 0,
 'SLURM': {'ADDITIONAL_PARAMETERS': {},
           'COMMENT': 'vissl job',
           'CONSTRAINT': '',
           'LOG_FOLDER': '.',
           'MEM_GB': 250,
           'NAME': 'vissl',
           'NUM_CPU_PER_PROC': 8,
           'PARTITION': '',
           'PORT_ID': 40050,
           'TIME_HOURS': 72,
           'TIME_MINUTES': 0,
           'USE_SLURM': False},
 'SVM': {'cls_list': [],
         'costs': {'base': -1.0,
                   'costs_list': [0.1, 0.01],
                   'power_range': [4, 20]},
         'cross_val_folds': 3,
         'dual': True,
         'force_retrain': False,

 'TRAINER': {'TASK_NAME': 'self_supervision_task',
             'TRAIN_STEP_NAME': 'standard_train_step'},
 'VERBOSE': False}
INFO 2021-11-13 18:02:40,372 train.py: 117: System config:
-------------------  --------------------------------------------------------------------------------------
sys.platform         linux
Python               3.8.2 (default, Mar 26 2020, 15:53:00) [GCC 7.3.0]
numpy                1.19.5
Pillow               7.2.0
vissl                0.1.6 @/home/xxxx/vissl/vissl
GPU available        True
GPU 0,1,2            GeForce RTX 2080 Ti
CUDA_HOME            /usr/local/cuda-10.0
torchvision          0.8.2 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torchvision
hydra                1.0.7 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/hydra
classy_vision        0.7.0.dev @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision
tensorboard          2.7.0
apex                 0.1 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/apex
cv2                  4.5.4-dev
PyTorch              1.7.1 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch
PyTorch debug build  False
-------------------  --------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

CPU info:
-------------------  ----------------------------------------
Architecture         x86_64
CPU op-mode(s)       32-bit, 64-bit
Byte Order           Little Endian
CPU(s)               72
On-line CPU(s) list  0-71
Thread(s) per core   2
Core(s) per socket   18
Socket(s)            2
NUMA node(s)         2
Vendor ID            GenuineIntel
CPU family           6
Model                85
Model name           Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping             4
CPU MHz              1000.037
CPU max MHz          3700.0000
CPU min MHz          1000.0000
BogoMIPS             4600.00
Virtualization       VT-x
L1d cache            32K
L1i cache            32K
L2 cache             1024K
L3 cache             25344K
NUMA node0 CPU(s)    0-17,36-53
NUMA node1 CPU(s)    18-35,54-71
-------------------  ----------------------------------------
WARNING 2021-11-13 18:02:40,373 moco_hooks.py:  45: Batch shuffling: True
INFO 2021-11-13 18:02:40,374 tensorboard.py:  49: Tensorboard dir: ./tb_logs
INFO 2021-11-13 18:02:40,380 tensorboard_hook.py:  90: Setting up SSL Tensorboard Hook...
INFO 2021-11-13 18:02:40,380 tensorboard_hook.py: 102: Tensorboard config: log_params: False, log_params_freq: 310, log_params_gradients: True, log_activation_statistics: 0
INFO 2021-11-13 18:02:40,382 trainer_main.py: 112: Using Distributed init method: tcp://localhost:60803, world_size: 1, rank: 0
INFO 2021-11-13 18:02:42,764 trainer_main.py: 130: | initialized host tasty as rank 0 (0)
INFO 2021-11-13 18:02:42,765 train_task.py: 181: Not using Automatic Mixed Precision
INFO 2021-11-13 18:02:42,766 train_task.py: 455: Building model....
INFO 2021-11-13 18:02:42,766 resnext.py:  64: ResNeXT trunk, supports activation checkpointing. Deactivated
INFO 2021-11-13 18:02:42,767 resnext.py:  87: Building model: ResNeXt50-1x64d-w2-BatchNorm2d
INFO 2021-11-13 18:02:43,809 train_task.py: 656: Broadcast model BN buffers from primary on every forward pass
INFO 2021-11-13 18:02:43,809 classification_task.py: 387: Synchronized Batch Normalization is disabled
INFO 2021-11-13 18:02:43,929 optimizer_helper.py: 293:
Trainable params: 163,
Non-Trainable params: 0,
Trunk Regularized Parameters: 159,
Trunk Unregularized Parameters 0,
Head Regularized Parameters: 4,
Head Unregularized Parameters: 0
Remaining Regularized Parameters: 0
Remaining Unregularized Parameters: 0
INFO 2021-11-13 18:02:43,933 img_replicate_pil.py:  52: ImgReplicatePil | Using num_times: 2
INFO 2021-11-13 18:02:43,934 img_pil_color_distortion.py:  56: ImgPilColorDistortion | Using strength: 0.5
INFO 2021-11-13 18:02:43,935 ssl_dataset.py: 156: Rank: 0 split: TRAIN Data files:
['/DATA/imagenet/ILSVRC/Data/CLS-LOC/train']
INFO 2021-11-13 18:02:43,935 ssl_dataset.py: 159: Rank: 0 split: TRAIN Label files:
['/DATA/imagenet/ILSVRC/Data/CLS-LOC/train']
INFO 2021-11-13 18:02:47,936 disk_dataset.py:  86: Loaded 1281167 samples from folder /DATA/imagenet/ILSVRC/Data/CLS-LOC/train
INFO 2021-11-13 18:02:47,937 misc.py: 161: Set start method of multiprocessing to forkserver
INFO 2021-11-13 18:02:47,951 __init__.py: 126: Created the Distributed Sampler....
INFO 2021-11-13 18:02:47,951 __init__.py: 101: Distributed Sampler config:
{'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 1281167, 'total_size': 1281167, 'shuffle': True, 'seed': 0}
INFO 2021-11-13 18:02:47,952 __init__.py: 215: Wrapping the dataloader to async device copies
INFO 2021-11-13 18:02:47,953 train_task.py: 384: Building loss...
INFO 2021-11-13 18:02:48,085 trainer_main.py: 268: Training 200 epochs
INFO 2021-11-13 18:02:48,085 trainer_main.py: 269: One epoch = 40036 iterations.
INFO 2021-11-13 18:02:48,085 trainer_main.py: 270: Total 1281167 samples in one epoch
INFO 2021-11-13 18:02:48,085 trainer_main.py: 276: Total 8007200 iterations for training
INFO 2021-11-13 18:02:48,253 logger.py:  84: Sat Nov 13 18:02:48 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 27%   35C    P2    55W / 260W |   1282MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:5E:00.0 Off |                  N/A |
| 27%   26C    P8     1W / 260W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |                  N/A |
| 27%   32C    P8    16W / 260W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     30064      C   python                           1279MiB |
+-----------------------------------------------------------------------------+

INFO 2021-11-13 18:02:48,260 trainer_main.py: 173: Model is:
 Classy <class 'vissl.models.base_ssl_model.BaseSSLMultiInputOutputModel'>:
BaseSSLMultiInputOutputModel(
  (_heads): ModuleDict()
  (trunk): ResNeXt(
    (_feature_blocks): ModuleDict(
      (conv1): Conv2d(3, 128, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1_relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stat
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(2048, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(2048, 4096, kernel_size=(1, 1), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), bias=False)
            (1): BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(4096, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): Ba
          (bn3): BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
      (flatten): Flatten()
    )
  )
  (heads): ModuleList(
    (0): MLP(
      (clf): Sequential(
        (0): Linear(in_features=2048, out_features=2048, bias=True)
        (1): ReLU(inplace=True)
      )
    )
    (1): MLP(
      (clf): Sequential(
        (0): Linear(in_features=2048, out_features=128, bias=True)
      )
    )
  )
)
INFO 2021-11-13 18:02:48,260 trainer_main.py: 174: Loss is: {'name': 'MoCoLoss'}
INFO 2021-11-13 18:02:48,261 trainer_main.py: 175: Starting training....
INFO 2021-11-13 18:02:48,261 __init__.py: 101: Distributed Sampler config:
{'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 1281167, 'total_size': 1281167, 'shuffle': True, 'seed': 0}
** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath

INFO 2021-11-13 18:02:59,768 trainer_main.py: 333: Phase advanced. Rank: 0
INFO 2021-11-13 18:02:59,769 log_hooks.py:  76: ========= Memory Summary at on_phase_start =======
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  455084 KB |  455084 KB |  455085 KB |    1024 B  |
|       from large pool |  442112 KB |  442112 KB |  442112 KB |       0 B  |
|       from small pool |   12972 KB |   12972 KB |   12973 KB |    1024 B  |
|---------------------------------------------------------------------------|
| Active memory         |  455084 KB |  455084 KB |  455085 KB |    1024 B  |
|       from large pool |  442112 KB |  442112 KB |  442112 KB |       0 B  |
|       from small pool |   12972 KB |   12972 KB |   12973 KB |    1024 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  495616 KB |  495616 KB |  495616 KB |       0 B  |
|       from large pool |  479232 KB |  479232 KB |  479232 KB |       0 B  |
|       from small pool |   16384 KB |   16384 KB |   16384 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |   40531 KB |   40532 KB |  141685 KB |  101153 KB |
|       from large pool |   37120 KB |   37120 KB |  126976 KB |   89856 KB |
|       from small pool |    3411 KB |    3412 KB |   14709 KB |   11297 KB |
|---------------------------------------------------------------------------|
| Allocations           |     329    |     329    |     331    |       2    |
|       from large pool |      38    |      38    |      38    |       0    |
|       from small pool |     291    |     291    |     293    |       2    |
|---------------------------------------------------------------------------|
| Active allocs         |     329    |     329    |     331    |       2    |
|       from large pool |      38    |      38    |      38    |       0    |
|       from small pool |     291    |     291    |     293    |       2    |
|---------------------------------------------------------------------------|
| GPU reserved segments |      29    |      29    |      29    |       0    |
|       from large pool |      21    |      21    |      21    |       0    |
|       from small pool |       8    |       8    |       8    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |      13    |      13    |      20    |       7    |
|       from large pool |      10    |      10    |      10    |       0    |
|       from small pool |       3    |       3    |      10    |       7    |
|===========================================================================|

INFO 2021-11-13 18:02:59,769 state_update_hooks.py: 115: Starting phase 0 [train]
--- Logging error ---
Traceback (most recent call last):
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 150, in launch_distributed
    _distributed_worker(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 192, in _distributed_worker
    run_engine(
  File "/home/xxx/vissl/vissl/engines/engine_registry.py", line 86, in run_engine
    engine.run_engine(
  File "/home/xxx/vissl/vissl/engines/train.py", line 39, in run_engine
    train_main(
  File "/home/xxx/vissl/vissl/engines/train.py", line 130, in train_main
    trainer.train()
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 211, in train
    raise e
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 193, in train
    task = train_step_fn(task)
  File "/home/xxx/vissl/vissl/trainer/train_steps/standard_train_step.py", line 143, in standard_train_step
    model_output = task.model(sample["input"])
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 97, in __call__
    return self.forward(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 111, in forward
    out = self.classy_model(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 180, in forward
    return self.single_input_forward(batch, self._output_feature_names, self.heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 138, in single_input_forward
    return self.heads_forward(feats, heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 159, in heads_forward
    output = head(output)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 1081, in emit
    msg = self.format(record)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 925, in format
    return fmt.format(record)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 664, in format
    record.message = record.getMessage()
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "run_distributed_engines.py", line 54, in <module>
    hydra_main(overrides=overrides)
  File "run_distributed_engines.py", line 40, in hydra_main
    launch_distributed(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 162, in launch_distributed
    logging.error("Wrapping up, caught exception: ", e)
Message: 'Wrapping up, caught exception: '
Arguments: (RuntimeError('mat1 dim 1 must match mat2 dim 0'),)
--- Logging error ---
Traceback (most recent call last):
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 150, in launch_distributed
    _distributed_worker(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 192, in _distributed_worker
    run_engine(
  File "/home/xxx/vissl/vissl/engines/engine_registry.py", line 86, in run_engine
    engine.run_engine(
  File "/home/xxx/vissl/vissl/engines/train.py", line 39, in run_engine
    train_main(
  File "/home/xxx/vissl/vissl/engines/train.py", line 130, in train_main
    trainer.train()
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 211, in train
    raise e
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 193, in train
    task = train_step_fn(task)
  File "/home/xxx/vissl/vissl/trainer/train_steps/standard_train_step.py", line 143, in standard_train_step
    model_output = task.model(sample["input"])
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 97, in __call__
    return self.forward(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 111, in forward
    out = self.classy_model(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 180, in forward
    return self.single_input_forward(batch, self._output_feature_names, self.heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 138, in single_input_forward
    return self.heads_forward(feats, heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 159, in heads_forward
    output = head(output)
  F
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 1081, in emit
    msg = self.format(record)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 925, in format
    return fmt.format(record)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 664, in format
    record.message = record.getMessage()
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "run_distributed_engines.py", line 54, in <module>
    hydra_main(overrides=overrides)
  File "run_distributed_engines.py", line 40, in hydra_main
    launch_distributed(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 162, in launch_distributed
    logging.error("Wrapping up, caught exception: ", e)
Message: 'Wrapping up, caught exception: '
Arguments: (RuntimeError('mat1 dim 1 must match mat2 dim 0'),)
Traceback (most recent call last):
  File "run_distributed_engines.py", line 54, in <module>
    hydra_main(overrides=overrides)
  File "run_distributed_engines.py", line 40, in hydra_main
    launch_distributed(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 164, in launch_distributed
    raise e
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 150, in launch_distributed
    _distributed_worker(
  File "/home/xxx/vissl/vissl/utils/distributed_launcher.py", line 192, in _distributed_worker
    run_engine(
  File "/home/xxx/vissl/vissl/engines/engine_registry.py", line 86, in run_engine
    engine.run_engine(
  File "/home/xxx/vissl/vissl/engines/train.py", line 39, in run_engine
    train_main(
  File "/home/xxx/vissl/vissl/engines/train.py", line 130, in train_main
    trainer.train()
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 211, in train
    raise e
  File "/home/xxx/vissl/vissl/trainer/trainer_main.py", line 193, in train
    task = train_step_fn(task)
  File "/home/xxx/vissl/vissl/trainer/train_steps/standard_train_step.py", line 143, in standard_train_step
    model_output = task.model(sample["input"])
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 97, in __call__
    return self.forward(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision/models/classy_model.py", line 111, in forward
    out = self.classy_model(*args, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 180, in forward
    return self.single_input_forward(batch, self._output_feature_names, self.heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 138, in single_input_forward
    return self.heads_forward(feats, heads)
  File "/home/xxx/vissl/vissl/models/base_ssl_model.py", line 159, in heads_forward
    output = head(output)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/vissl/vissl/models/heads/mlp.py", line 111, in forward
    out = self.clf(batch)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0
  1. please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.

Expected behavior:

If there are no obvious error in "what you observed" provided above, please tell us the expected behavior.

Environment:

Provide your environment information using the following command:


numpy                1.19.5
Pillow               7.2.0
vissl                0.1.6 @/home/xxx/vissl/vissl
GPU available        True
GPU 0,1,2            GeForce RTX 2080 Ti
CUDA_HOME            /usr/local/cuda-10.0
torchvision          0.8.2 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torchvision
hydra                1.0.7 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/hydra
classy_vision        0.7.0.dev @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/classy_vision
tensorboard          2.7.0
apex                 0.1 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/apex
cv2                  4.5.4-dev
PyTorch              1.7.1 @/home/xxx/.conda/envs/vissl/lib/python3.8/site-packages/torch
PyTorch debug build  False
-------------------  --------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1,
PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

CPU info:
-------------------  ----------------------------------------
Architecture         x86_64
CPU op-mode(s)       32-bit, 64-bit
Byte Order           Little Endian
CPU(s)               72
On-line CPU(s) list  0-71
Thread(s) per core   2
Core(s) per socket   18
Socket(s)            2
NUMA node(s)         2
Vendor ID            GenuineIntel
CPU family           6
Model                85
Model name           Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping             4
CPU MHz              1000.035
CPU max MHz          3700.0000
CPU min MHz          1000.0000
BogoMIPS             4600.00
Virtualization       VT-x
L1d cache            32K
L1i cache            32K
L2 cache             1024K
L3 cache             25344K
NUMA node0 CPU(s)    0-17,36-53
NUMA node1 CPU(s)    18-35,54-71
-------------------  ----------------------------------------```
iseessel commented 2 years ago

@CharlieCheckpt Thank you for bringing this to our attention!

After doing some digging, it seems like there may be some differences in the wide architectures. The wide architecture introduced: https://arxiv.org/abs/1605.07146 indeed only doubles the width of the residual layers and not the conv1.

But I have seen official checkpoints that also doubles the width of the conv1 layer. See for example BYOL: https://github.com/chigur/byol-convert/blob/main/resnet.py#L169. This script properly loads and converts the RESNET-200 2x BYOL model.

I am not sure if this is a confusion in the literature or a concious decision -- if so, I have not seen it explicit in what I've read.

@prigoyal @QuentinDuval Do you guys know anything more about this?

prigoyal commented 2 years ago

agree with above. It might be best to extend the resnext code in vissl to support different versions i.e. "wide_resnet50_2" as in https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L20