facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

Feature extraction with transformers (DINO) #483

Closed blazejdolicki closed 2 years ago

blazejdolicki commented 2 years ago

If you do not know the root cause of the problem, and wish someone to help you, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

  1. full code you wrote or full changes you made (git diff)
    <put code or diff here>

    I didn't write any code.

  2. what exact command you run: I'm running this command:

    python3 tools/run_distributed_engines.py \
    hydra.verbose=true \
    config=$CONFIG_PATH\
    config.DATA.TRAIN.DATA_SOURCES=[synthetic] \
    config.DATA.TRAIN.LABEL_SOURCES=[synthetic] \
    config.DATA.TEST.DATA_SOURCES=[synthetic] \
    config.DATA.TEST.LABEL_SOURCES=[synthetic] \
    config.CHECKPOINT.DIR=$EXPERIMENT_DIR_CONTAINER/$SLURM_JOB_NAME/checkpoints/$SLURM_JOB_ID \
    config.MODEL.WEIGHTS_INIT.PARAMS_FILE=$MODEL_WEIGHTS

    $CONFIG_PATH leads to a .yaml config file with the following content:

    # @package _global_
    config:
    TEST_MODEL: True
    DATA:
    NUM_DATALOADER_WORKERS: 5
    TEST:
      BATCHSIZE_PER_REPLICA: 64
      MMAP_MODE: False
      COPY_TO_LOCAL_DISK: False
      ENABLE_QUEUE_DATASET: False
      TRANSFORMS:
      - name: Resize
        size: 256
      - name: CenterCrop
        size: 224
      - name: ToTensor
      - mean:
        - 0.485
        - 0.456
        - 0.406
        name: Normalize
        std:
        - 0.229
        - 0.224
        - 0.225
    TRAIN:
      BATCHSIZE_PER_REPLICA: 64
      MMAP_MODE: False
      COPY_TO_LOCAL_DISK: False
      ENABLE_QUEUE_DATASET: False
    
      DATASET_NAMES:
      - imagenet1k_folder
      DATA_SOURCES:
      - disk_folder
      TRANSFORMS:
      - name: Resize
        size: 256
      - name: CenterCrop
        size: 224
      - name: ToTensor
      - mean:
        - 0.485
        - 0.456
        - 0.406
        name: Normalize
        std:
        - 0.229
        - 0.224
        - 0.225
    DISTRIBUTED:
    BACKEND: nccl
    INIT_METHOD: tcp
    NCCL_DEBUG: true
    NUM_NODES: 1
    NUM_PROC_PER_NODE: 1
    RUN_ID: auto
    MACHINE:
    DEVICE: gpu
    MODEL:
    FEATURE_EVAL_SETTINGS:
      EVAL_MODE_ON: true
      EXTRACT_TRUNK_FEATURES_ONLY: true
      FREEZE_TRUNK_ONLY: True
      SHOULD_FLATTEN_FEATS: False
    TRUNK:
      NAME: vision_transformer
      VISION_TRANSFORMERS:
        ATTENTION_DROPOUT_RATE: 0
        CLASSIFIER: token
        DROPOUT_RATE: 0
        DROP_PATH_RATE: 0.1
        HIDDEN_DIM: 384
        IMAGE_SIZE: 224
        MLP_DIM: 1532
        NUM_HEADS: 6
        NUM_LAYERS: 12
        PATCH_SIZE: 16
        QKV_BIAS: true
    WEIGHTS_INIT:
      PARAM_FILE: {}
    engine_name: extract_features

    while $MODEL_WEIGHTS refers to a .torch file with a DINO model pretrained with VISSL.

  3. full logs you observed:
    
    INFO 2021-12-20 16:15:52,761 extract_features.py:  80: Env set for rank: 0, dist_rank: 0
    INFO 2021-12-20 16:15:52,761 env.py:  50: BASH_ENV: /opt/lmod/lmod/init/bash
    INFO 2021-12-20 16:15:52,761 env.py:  50: BASH_FUNC_ml%%:   () {  eval $($LMOD_DIR/ml_cmd "$@")
    }
    INFO 2021-12-20 16:15:52,761 env.py:  50: BASH_FUNC_module%%:   () {  eval $($LMOD_CMD bash "$@") && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
    }
    INFO 2021-12-20 16:15:52,761 env.py:  50: CONFIG_PATH:  /users/hissl/.jupyter/jupyter_notebook_config.py
    INFO 2021-12-20 16:15:52,761 env.py:  50: CUDA_PATH:    /usr/local/cuda
    INFO 2021-12-20 16:15:52,761 env.py:  50: CUDA_ROOT:    /usr/local/cuda/bin
    INFO 2021-12-20 16:15:52,761 env.py:  50: CUDA_VERSION: 11.1.1
    INFO 2021-12-20 16:15:52,761 env.py:  50: CUDA_VISIBLE_DEVICES: 0,1,2,3
    INFO 2021-12-20 16:15:52,761 env.py:  50: CUDNN_VERSION:    8.0.5.39
    INFO 2021-12-20 16:15:52,761 env.py:  50: DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/55916/bus
    INFO 2021-12-20 16:15:52,761 env.py:  50: ENVIRONMENT:  BATCH
    INFO 2021-12-20 16:15:52,761 env.py:  50: FPATH:    /opt/lmod/lmod/init/ksh_funcs
    INFO 2021-12-20 16:15:52,761 env.py:  50: GPU_DEVICE_ORDINAL:   0,1,2,3
    INFO 2021-12-20 16:15:52,761 env.py:  50: HOME: /home/bdolicki
    INFO 2021-12-20 16:15:52,762 env.py:  50: HOSTNAME: r28n5
    INFO 2021-12-20 16:15:52,762 env.py:  50: LANG: en_US
    INFO 2021-12-20 16:15:52,762 env.py:  50: LC_CTYPE: C.UTF-8
    INFO 2021-12-20 16:15:52,762 env.py:  50: LD_LIBRARY_PATH:  /usr/local/nvidia/lib64:/.singularity.d/libs
    INFO 2021-12-20 16:15:52,762 env.py:  50: LIBRARY_PATH: /usr/local/cuda/lib64/stubs
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_CASE_INDEPENDENT_SORTING:    yes
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_CMD: /opt/lmod/lmod/libexec/lmod
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_DIR: /opt/lmod/lmod/libexec
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_EXACT_MATCH: yes
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_PKG: /opt/lmod/lmod
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_ROOT:    /opt/lmod
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_SETTARG_FULL_SUPPORT:    no
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_SHORT_TIME:  10000
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_VERSION: 8.5.22
    INFO 2021-12-20 16:15:52,762 env.py:  50: LMOD_sys: Linux
    INFO 2021-12-20 16:15:52,762 env.py:  50: LOCAL_RANK:   0
    INFO 2021-12-20 16:15:52,762 env.py:  50: LOGNAME:  bdolicki
    INFO 2021-12-20 16:15:52,762 env.py:  50: MAIL: /var/mail/bdolicki
    INFO 2021-12-20 16:15:52,762 env.py:  50: MANPATH:  /opt/lmod/lmod/share/man::/opt/slurm/sw/current/share/man
    INFO 2021-12-20 16:15:52,762 env.py:  50: MODULEPATH:   /sw/noarch/modulefiles/environment
    INFO 2021-12-20 16:15:52,762 env.py:  50: MODULEPATH_ROOT:  /opt/modulefiles
    INFO 2021-12-20 16:15:52,762 env.py:  50: MODULESHOME:  /opt/lmod/lmod
    INFO 2021-12-20 16:15:52,762 env.py:  50: NCCL_ASYNC_ERROR_HANDLING:    1
    INFO 2021-12-20 16:15:52,762 env.py:  50: NCCL_DEBUG:   INFO
    INFO 2021-12-20 16:15:52,762 env.py:  50: NCCL_VERSION: 2.7.8
    INFO 2021-12-20 16:15:52,762 env.py:  50: NVIDIA_DRIVER_CAPABILITIES:   compute,utility
    INFO 2021-12-20 16:15:52,762 env.py:  50: NVIDIA_REQUIRE_CUDA:  cuda>=11.1 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451
    INFO 2021-12-20 16:15:52,762 env.py:  50: NVIDIA_VISIBLE_DEVICES:   all
    INFO 2021-12-20 16:15:52,762 env.py:  50: OLDPWD:   /home/bdolicki/thesis/ssl-histo
    INFO 2021-12-20 16:15:52,762 env.py:  50: PATH: /users/hissl/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/cuda/bin
    INFO 2021-12-20 16:15:52,762 env.py:  50: PROMPT_COMMAND:   PS1="Singularity> "; unset PROMPT_COMMAND
    INFO 2021-12-20 16:15:52,762 env.py:  50: PS1:  Singularity> 
    INFO 2021-12-20 16:15:52,762 env.py:  50: PWD:  /hissl
    INFO 2021-12-20 16:15:52,763 env.py:  50: PYTHONPATH:   /hissl
    INFO 2021-12-20 16:15:52,763 env.py:  50: RANK: 0
    INFO 2021-12-20 16:15:52,763 env.py:  50: ROCR_VISIBLE_DEVICES: 0,1,2,3
    INFO 2021-12-20 16:15:52,763 env.py:  50: SHELL:    /bin/bash
    INFO 2021-12-20 16:15:52,763 env.py:  50: SHLVL:    1
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_APPNAME:  
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_BIND: /home/bdolicki/thesis/hissl:/hissl,/home/bdolicki/thesis/ssl-histo/config/blazej:/hissl/configs/config/blazej,/home/bdolicki/thesis/hissl-logs:/hissl-logs,/home/bdolicki/thesis/ssl-histo/data/nct
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_COMMAND:  exec
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_CONTAINER:    /home/bdolicki/thesis/hissl_20210922_np121_h5py.sif
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_ENVIRONMENT:  /.singularity.d/env/91-environment.sh
    INFO 2021-12-20 16:15:52,763 env.py:  50: SINGULARITY_NAME: hissl_20210922_np121_h5py.sif
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURMD_NODENAME:  r28n5
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_CLUSTER_NAME:   lisa
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_CONF:   /opt/slurm/etc/slurm.conf
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_CPUS_ON_NODE:   24
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_GPUS_PER_NODE:  titanrtx:1
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_GTIDS:  0
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOBID:  8530930
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_ACCOUNT:    bdolicki
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_CPUS_PER_NODE:  24
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_GID:    55479
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_GPUS:   0,1,2,3
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_ID: 8530930
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_NAME:   extract_sample_nct_dino
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_NODELIST:   r28n5
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_NUM_NODES:  1
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_PARTITION:  gpu_titanrtx
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_QOS:    default
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_UID:    55916
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_JOB_USER:   bdolicki
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_LOCALID:    0
    INFO 2021-12-20 16:15:52,763 env.py:  50: SLURM_NNODES: 1
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_NODEID: 0
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_NODELIST:   r28n5
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_NODE_ALIASES:   (null)
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_PRIO_PROCESS:   0
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_PROCID: 0
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_SPANK_SURF_EXCLUSIVE:   0
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_SUBMIT_DIR: /home/bdolicki/thesis/ssl-histo
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_SUBMIT_HOST:    login3.lisa.surfsara.nl
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_TASKS_PER_NODE: 24
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_TASK_PID:   13469
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_TOPOLOGY_ADDR:  gigabit..gpu.I09_I10_I15_I16_I17_I19.r28n5
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_TOPOLOGY_ADDR_PATTERN:  switch.switch.switch.switch.node
    INFO 2021-12-20 16:15:52,764 env.py:  50: SLURM_WORKING_CLUSTER:    lisa:batch4.lisa.surfsara.nl:6817:9216:109
    INFO 2021-12-20 16:15:52,764 env.py:  50: SSH_CLIENT:   194.42.110.163 53532 22
    INFO 2021-12-20 16:15:52,764 env.py:  50: SSH_CONNECTION:   194.42.110.163 53532 145.101.32.96 22
    INFO 2021-12-20 16:15:52,764 env.py:  50: SSH_TTY:  /dev/pts/16
    INFO 2021-12-20 16:15:52,764 env.py:  50: SURF_EXCLUSIVE:   0
    INFO 2021-12-20 16:15:52,764 env.py:  50: TAR:  /bin/tar
    INFO 2021-12-20 16:15:52,764 env.py:  50: TERM: xterm-256color
    INFO 2021-12-20 16:15:52,764 env.py:  50: TMPDIR:   /scratch
    INFO 2021-12-20 16:15:52,764 env.py:  50: USER: bdolicki
    INFO 2021-12-20 16:15:52,764 env.py:  50: USER_PATH:    /usr/bin:/bin:/usr/bin/X11:/usr/games:/usr/sara/bin:/opt/slurm/bin:/opt/slurm/sbin:/opt/slurm/sw/current/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
    INFO 2021-12-20 16:15:52,764 env.py:  50: VISSL_DATASET_CATALOG_PATH:   /hissl/custom_catalog.json
    INFO 2021-12-20 16:15:52,764 env.py:  50: WORLD_SIZE:   1
    INFO 2021-12-20 16:15:52,764 env.py:  50: XALT_EXECUTABLE_TRACKING: yes
    INFO 2021-12-20 16:15:52,764 env.py:  50: XALT_GPU_TRACKING:    yes
    INFO 2021-12-20 16:15:52,764 env.py:  50: XALT_SAMPLING:    yes
    INFO 2021-12-20 16:15:52,764 env.py:  50: XDG_RUNTIME_DIR:  /run/user/55916
    INFO 2021-12-20 16:15:52,764 env.py:  50: XDG_SESSION_CLASS:    user
    INFO 2021-12-20 16:15:52,764 env.py:  50: XDG_SESSION_ID:   c11004
    INFO 2021-12-20 16:15:52,764 env.py:  50: XDG_SESSION_TYPE: tty
    INFO 2021-12-20 16:15:52,764 env.py:  50: _:    /usr/bin/singularity
    INFO 2021-12-20 16:15:52,764 env.py:  50: __LMOD_SET_FPATH: 1
    INFO 2021-12-20 16:15:52,765 misc.py: 141: Set start method of multiprocessing to forkserver
    INFO 2021-12-20 16:15:52,765 extract_features.py:  91: Setting seed....
    INFO 2021-12-20 16:15:52,765 misc.py: 153: MACHINE SEED: 0
    INFO 2021-12-20 16:15:52,785 hydra_config.py:  88: Training with config:
    INFO 2021-12-20 16:15:52,793 hydra_config.py:  97: {'CHECKPOINT': {'APPEND_DISTR_RUN_ID': False,
                'AUTO_RESUME': True,
                'BACKEND': 'disk',
                'CHECKPOINT_FREQUENCY': 1,
                'CHECKPOINT_ITER_FREQUENCY': -1,
                'DIR': '/hissl-logs/extract_sample_nct_dino/checkpoints/8530930',
                'LATEST_CHECKPOINT_RESUME_FILE_NUM': 1,
                'OVERWRITE_EXISTING': False,
                'USE_SYMLINK_CHECKPOINT_FOR_RESUME': False},
    'CLUSTERFIT': {'CLUSTER_BACKEND': 'faiss',
                'FEATURES': {'DATASET_NAME': '',
                             'DATA_PARTITION': 'TRAIN',
                             'LAYER_NAME': ''},
                'NUM_CLUSTERS': 16000,
                'N_ITER': 50},
    'DATA': {'DDP_BUCKET_CAP_MB': 25,
          'ENABLE_ASYNC_GPU_COPY': True,
          'NUM_DATALOADER_WORKERS': 5,
          'PIN_MEMORY': True,
          'TEST': {'BASE_DATASET': 'generic_ssl',
                   'BATCHSIZE_PER_REPLICA': 64,
                   'COLLATE_FUNCTION': 'default_collate',
                   'COLLATE_FUNCTION_PARAMS': {},
                   'COPY_DESTINATION_DIR': '',
                   'COPY_TO_LOCAL_DISK': False,
                   'DATASET_NAMES': ['imagenet1k_folder'],
                   'DATA_LIMIT': -1,
                   'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False,
                                           'SEED': 0,
                                           'SKIP_NUM_SAMPLES': 0},
                   'DATA_PATHS': [],
                   'DATA_SOURCES': ['synthetic'],
                   'DEFAULT_GRAY_IMG_SIZE': 224,
                   'DROP_LAST': False,
                   'ENABLE_QUEUE_DATASET': False,
                   'INPUT_KEY_NAMES': ['data'],
                   'LABEL_PATHS': [],
                   'LABEL_SOURCES': ['synthetic'],
                   'LABEL_TYPE': 'standard',
                   'MMAP_MODE': False,
                   'NEW_IMG_PATH_PREFIX': '',
                   'REMOVE_IMG_PATH_PREFIX': '',
                   'TARGET_KEY_NAMES': ['label'],
                   'TRANSFORMS': [{'name': 'Resize', 'size': 256},
                                  {'name': 'CenterCrop', 'size': 224},
                                  {'name': 'ToTensor'},
                                  {'mean': [0.485, 0.456, 0.406],
                                   'name': 'Normalize',
                                   'std': [0.229, 0.224, 0.225]}],
                   'USE_DEBUGGING_SAMPLER': False,
                   'USE_STATEFUL_DISTRIBUTED_SAMPLER': False},
          'TRAIN': {'BASE_DATASET': 'generic_ssl',
                    'BATCHSIZE_PER_REPLICA': 64,
                    'COLLATE_FUNCTION': 'default_collate',
                    'COLLATE_FUNCTION_PARAMS': {},
                    'COPY_DESTINATION_DIR': '',
                    'COPY_TO_LOCAL_DISK': False,
                    'DATASET_NAMES': ['imagenet1k_folder'],
                    'DATA_LIMIT': -1,
                    'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False,
                                            'SEED': 0,
                                            'SKIP_NUM_SAMPLES': 0},
                    'DATA_PATHS': [],
                    'DATA_SOURCES': ['synthetic'],
                    'DEFAULT_GRAY_IMG_SIZE': 224,
                    'DROP_LAST': False,
                    'ENABLE_QUEUE_DATASET': False,
                    'INPUT_KEY_NAMES': ['data'],
                    'LABEL_PATHS': [],
                    'LABEL_SOURCES': ['synthetic'],
                    'LABEL_TYPE': 'standard',
                    'MMAP_MODE': False,
                    'NEW_IMG_PATH_PREFIX': '',
                    'REMOVE_IMG_PATH_PREFIX': '',
                    'TARGET_KEY_NAMES': ['label'],
                    'TRANSFORMS': [{'name': 'Resize', 'size': 256},
                                   {'name': 'CenterCrop', 'size': 224},
                                   {'name': 'ToTensor'},
                                   {'mean': [0.485, 0.456, 0.406],
                                    'name': 'Normalize',
                                    'std': [0.229, 0.224, 0.225]}],
                    'USE_DEBUGGING_SAMPLER': False,
                    'USE_STATEFUL_DISTRIBUTED_SAMPLER': False}},
    'DISTRIBUTED': {'BACKEND': 'nccl',
                 'BROADCAST_BUFFERS': True,
                 'INIT_METHOD': 'tcp',
                 'MANUAL_GRADIENT_REDUCTION': False,
                 'NCCL_DEBUG': True,
                 'NCCL_SOCKET_NTHREADS': '',
                 'NUM_NODES': 1,
                 'NUM_PROC_PER_NODE': 1,
                 'RUN_ID': 'auto'},
    'EXTRACT_FEATURES': {'CHUNK_THRESHOLD': 0, 'OUTPUT_DIR': ''},
    'HOOKS': {'LOG_GPU_STATS': True,
           'MEMORY_SUMMARY': {'DUMP_MEMORY_ON_EXCEPTION': False,
                              'LOG_ITERATION_NUM': 0,
                              'PRINT_MEMORY_SUMMARY': True},
           'MODEL_COMPLEXITY': {'COMPUTE_COMPLEXITY': False,
                                'INPUT_SHAPE': [3, 224, 224]},
           'PERF_STATS': {'MONITOR_PERF_STATS': False,
                          'PERF_STAT_FREQUENCY': -1,
                          'ROLLING_BTIME_FREQ': -1},
           'TENSORBOARD_SETUP': {'EXPERIMENT_LOG_DIR': 'tensorboard',
                                 'FLUSH_EVERY_N_MIN': 5,
                                 'LOG_DIR': '.',
                                 'LOG_PARAMS': True,
                                 'LOG_PARAMS_EVERY_N_ITERS': 310,
                                 'LOG_PARAMS_GRADIENTS': True,
                                 'USE_TENSORBOARD': False}},
    'IMG_RETRIEVAL': {'CROP_QUERY_ROI': False,
                   'DATASET_PATH': '',
                   'DEBUG_MODE': False,
                   'EVAL_BINARY_PATH': '',
                   'EVAL_DATASET_NAME': 'Paris',
                   'FEATS_PROCESSING_TYPE': '',
                   'GEM_POOL_POWER': 4.0,
                   'NORMALIZE_FEATURES': True,
                   'NUM_DATABASE_SAMPLES': -1,
                   'NUM_QUERY_SAMPLES': -1,
                   'NUM_TRAINING_SAMPLES': -1,
                   'N_PCA': 512,
                   'RESIZE_IMG': 1024,
                   'SAVE_FEATURES': False,
                   'SAVE_RETRIEVAL_RANKINGS_SCORES': True,
                   'SPATIAL_LEVELS': 3,
                   'TRAIN_DATASET_NAME': 'Oxford',
                   'TRAIN_PCA_WHITENING': True,
                   'WHITEN_IMG_LIST': ''},
    'LOG_FREQUENCY': 10,
    'LOSS': {'CrossEntropyLoss': {'ignore_index': -1},
          'barlow_twins_loss': {'embedding_dim': 8192,
                                'lambda_': 0.0051,
                                'scale_loss': 0.024},
          'bce_logits_multiple_output_single_target': {'normalize_output': False,
                                                       'reduction': 'none',
                                                       'world_size': 1},
          'cross_entropy_multiple_output_single_target': {'ignore_index': -1,
                                                          'normalize_output': False,
                                                          'reduction': 'mean',
                                                          'temperature': 1.0,
                                                          'weight': None},
          'deepclusterv2_loss': {'BATCHSIZE_PER_REPLICA': 256,
                                 'DROP_LAST': True,
                                 'kmeans_iters': 10,
                                 'memory_params': {'crops_for_mb': [0],
                                                   'embedding_dim': 128},
                                 'num_clusters': [3000, 3000, 3000],
                                 'num_crops': 2,
                                 'num_train_samples': -1,
                                 'temperature': 0.1},
          'dino_loss': {'crops_for_teacher': [0, 1],
                        'ema_center': 0.9,
                        'momentum': 0.996,
                        'normalize_last_layer': True,
                        'output_dim': 65536,
                        'student_temp': 0.1,
                        'teacher_temp_max': 0.07,
                        'teacher_temp_min': 0.04,
                        'teacher_temp_warmup_iters': 37500},
          'moco_loss': {'embedding_dim': 128,
                        'momentum': 0.999,
                        'queue_size': 65536,
                        'temperature': 0.2},
          'multicrop_simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096,
                                                               'embedding_dim': 128,
                                                               'world_size': 64},
                                             'num_crops': 2,
                                             'temperature': 0.1},
          'name': 'CrossEntropyLoss',
          'nce_loss_with_memory': {'loss_type': 'nce',
                                   'loss_weights': [1.0],
                                   'memory_params': {'embedding_dim': 128,
                                                     'memory_size': -1,
                                                     'momentum': 0.5,
                                                     'norm_init': True,
                                                     'update_mem_on_forward': True},
                                   'negative_sampling_params': {'num_negatives': 16000,
                                                                'type': 'random'},
                                   'norm_constant': -1,
                                   'norm_embedding': True,
                                   'num_train_samples': -1,
                                   'temperature': 0.07,
                                   'update_mem_with_emb_index': -100},
          'simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096,
                                                     'embedding_dim': 128,
                                                     'world_size': 64},
                                   'temperature': 0.1},
          'swav_loss': {'crops_for_assign': [0, 1],
                        'embedding_dim': 128,
                        'epsilon': 0.05,
                        'normalize_last_layer': True,
                        'num_crops': 2,
                        'num_iters': 3,
                        'num_prototypes': [3000],
                        'output_dir': '.',
                        'queue': {'local_queue_length': 0,
                                  'queue_length': 0,
                                  'start_iter': 0},
                        'temp_hard_assignment_iters': 0,
                        'temperature': 0.1,
                        'use_double_precision': False},
          'swav_momentum_loss': {'crops_for_assign': [0, 1],
                                 'embedding_dim': 128,
                                 'epsilon': 0.05,
                                 'momentum': 0.99,
                                 'momentum_eval_mode_iter_start': 0,
                                 'normalize_last_layer': True,
                                 'num_crops': 2,
                                 'num_iters': 3,
                                 'num_prototypes': [3000],
                                 'queue': {'local_queue_length': 0,
                                           'queue_length': 0,
                                           'start_iter': 0},
                                 'temperature': 0.1,
                                 'use_double_precision': False}},
    'MACHINE': {'DEVICE': 'gpu'},
    'METERS': {'accuracy_list_meter': {'meter_names': [],
                                    'num_meters': 1,
                                    'topk_values': [1]},
            'enable_training_meter': True,
            'mean_ap_list_meter': {'max_cpu_capacity': -1,
                                   'meter_names': [],
                                   'num_classes': 9605,
                                   'num_meters': 1},
            'name': ''},
    'MODEL': {'ACTIVATION_CHECKPOINTING': {'NUM_ACTIVATION_CHECKPOINTING_SPLITS': 2,
                                        'USE_ACTIVATION_CHECKPOINTING': False},
           'AMP_PARAMS': {'AMP_ARGS': {'opt_level': 'O1'},
                          'AMP_TYPE': 'apex',
                          'USE_AMP': False},
           'CUDA_CACHE': {'CLEAR_CUDA_CACHE': False, 'CLEAR_FREQ': 100},
           'FEATURE_EVAL_SETTINGS': {'EVAL_MODE_ON': True,
                                     'EVAL_TRUNK_AND_HEAD': False,
                                     'EXTRACT_TRUNK_FEATURES_ONLY': True,
                                     'FREEZE_TRUNK_AND_HEAD': False,
                                     'FREEZE_TRUNK_ONLY': True,
                                     'LINEAR_EVAL_FEAT_POOL_OPS_MAP': [],
                                     'SHOULD_FLATTEN_FEATS': False},
           'FSDP_CONFIG': {'AUTO_WRAP_THRESHOLD': 0,
                           'bucket_cap_mb': 0,
                           'clear_autocast_cache': True,
                           'compute_dtype': torch.float32,
                           'flatten_parameters': True,
                           'fp32_reduce_scatter': False,
                           'mixed_precision': True,
                           'verbose': True},
           'GRAD_CLIP': {'MAX_NORM': 1, 'NORM_TYPE': 2, 'USE_GRAD_CLIP': False},
           'HEAD': {'BATCHNORM_EPS': 1e-05,
                    'BATCHNORM_MOMENTUM': 0.1,
                    'PARAMS': [],
                    'PARAMS_MULTIPLIER': 1.0},
           'INPUT_TYPE': 'rgb',
           'MULTI_INPUT_HEAD_MAPPING': [],
           'NON_TRAINABLE_PARAMS': [],
           'SHARDED_DDP_SETUP': {'USE_SDP': False, 'reduce_buffer_size': -1},
           'SINGLE_PASS_EVERY_CROP': False,
           'SYNC_BN_CONFIG': {'CONVERT_BN_TO_SYNC_BN': False,
                              'GROUP_SIZE': -1,
                              'SYNC_BN_TYPE': 'pytorch'},
           'TEMP_FROZEN_PARAMS_ITER_MAP': [],
           'TRUNK': {'CONVIT': {'CLASS_TOKEN_IN_LOCAL_LAYERS': False,
                                'LOCALITY_DIM': 10,
                                'LOCALITY_STRENGTH': 1.0,
                                'N_GPSA_LAYERS': 10,
                                'USE_LOCAL_INIT': True},
                     'EFFICIENT_NETS': {},
                     'NAME': 'vision_transformer',
                     'REGNET': {},
                     'RESNETS': {'DEPTH': 50,
                                 'GROUPNORM_GROUPS': 32,
                                 'GROUPS': 1,
                                 'LAYER4_STRIDE': 2,
                                 'NORM': 'BatchNorm',
                                 'STANDARDIZE_CONVOLUTIONS': False,
                                 'WIDTH_MULTIPLIER': 1,
                                 'WIDTH_PER_GROUP': 64,
                                 'ZERO_INIT_RESIDUAL': False},
                     'VISION_TRANSFORMERS': {'ATTENTION_DROPOUT_RATE': 0,
                                             'CLASSIFIER': 'token',
                                             'DROPOUT_RATE': 0,
                                             'DROP_PATH_RATE': 0.1,
                                             'HIDDEN_DIM': 384,
                                             'IMAGE_SIZE': 224,
                                             'MLP_DIM': 1532,
                                             'NUM_HEADS': 6,
                                             'NUM_LAYERS': 12,
                                             'PATCH_SIZE': 16,
                                             'QKV_BIAS': True,
                                             'QK_SCALE': False,
                                             'name': None}},
           'WEIGHTS_INIT': {'APPEND_PREFIX': '',
                            'PARAMS_FILE': '/hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch',
                            'PARAM_FILE': {},
                            'REMOVE_PREFIX': '',
                            'SKIP_LAYERS': ['num_batches_tracked'],
                            'STATE_DICT_KEY_NAME': 'classy_state_dict'},
           '_MODEL_INIT_SEED': 0},
    'MONITORING': {'MONITOR_ACTIVATION_STATISTICS': 0},
    'MULTI_PROCESSING_METHOD': 'forkserver',
    'NEAREST_NEIGHBOR': {'L2_NORM_FEATS': False, 'SIGMA': 0.1, 'TOPK': 200},
    'OPTIMIZER': {'betas': [0.9, 0.999],
               'construct_single_param_group_only': False,
               'head_optimizer_params': {'use_different_lr': False,
                                         'use_different_wd': False,
                                         'weight_decay': 0.0001},
               'larc_config': {'clip': False,
                               'eps': 1e-08,
                               'trust_coefficient': 0.001},
               'momentum': 0.9,
               'name': 'sgd',
               'nesterov': False,
               'non_regularized_parameters': [],
               'num_epochs': 90,
               'param_schedulers': {'lr': {'auto_lr_scaling': {'auto_scale': False,
                                                               'base_lr_batch_size': 256,
                                                               'base_value': 0.1,
                                                               'scaling_type': 'linear'},
                                           'end_value': 0.0,
                                           'interval_scaling': [],
                                           'lengths': [],
                                           'milestones': [30, 60],
                                           'name': 'multistep',
                                           'schedulers': [],
                                           'start_value': 0.1,
                                           'update_interval': 'epoch',
                                           'value': 0.1,
                                           'values': [0.1, 0.01, 0.001]},
                                    'lr_head': {'auto_lr_scaling': {'auto_scale': False,
                                                                    'base_lr_batch_size': 256,
                                                                    'base_value': 0.1,
                                                                    'scaling_type': 'linear'},
                                                'end_value': 0.0,
                                                'interval_scaling': [],
                                                'lengths': [],
                                                'milestones': [30, 60],
                                                'name': 'multistep',
                                                'schedulers': [],
                                                'start_value': 0.1,
                                                'update_interval': 'epoch',
                                                'value': 0.1,
                                                'values': [0.1, 0.01, 0.001]}},
               'regularize_bias': True,
               'regularize_bn': False,
               'use_larc': False,
               'use_zero': False,
               'weight_decay': 0.0001},
    'PROFILING': {'MEMORY_PROFILING': {'TRACK_BY_LAYER_MEMORY': False},
               'NUM_ITERATIONS': 10,
               'OUTPUT_FOLDER': '.',
               'PROFILED_RANKS': [0, 1],
               'RUNTIME_PROFILING': {'LEGACY_PROFILER': False,
                                     'PROFILE_CPU': True,
                                     'PROFILE_GPU': True,
                                     'USE_PROFILER': False},
               'START_ITERATION': 0,
               'STOP_TRAINING_AFTER_PROFILING': False,
               'WARMUP_ITERATIONS': 0},
    'REPRODUCIBILITY': {'CUDDN_DETERMINISTIC': False},
    'SEED_VALUE': 0,
    'SLURM': {'ADDITIONAL_PARAMETERS': {},
           'COMMENT': 'vissl job',
           'CONSTRAINT': '',
           'LOG_FOLDER': '.',
           'MEM_GB': 250,
           'NAME': 'vissl',
           'NUM_CPU_PER_PROC': 8,
           'PARTITION': '',
           'PORT_ID': 40050,
           'TIME_HOURS': 72,
           'TIME_MINUTES': 0,
           'USE_SLURM': False},
    'SVM': {'cls_list': [],
         'costs': {'base': -1.0,
                   'costs_list': [0.1, 0.01],
                   'power_range': [4, 20]},
         'cross_val_folds': 3,
         'dual': True,
         'force_retrain': False,
         'loss': 'squared_hinge',
         'low_shot': {'dataset_name': 'voc',
                      'k_values': [1, 2, 4, 8, 16, 32, 64, 96],
                      'sample_inds': [1, 2, 3, 4, 5]},
         'max_iter': 2000,
         'normalize': True,
         'penalty': 'l2'},
    'TEST_EVERY_NUM_EPOCH': 1,
    'TEST_MODEL': True,
    'TEST_ONLY': False,
    'TRAINER': {'TASK_NAME': 'self_supervision_task',
             'TRAIN_STEP_NAME': 'standard_train_step'},
    'VERBOSE': False}
    INFO 2021-12-20 16:15:53,826 extract_features.py: 103: System config:
    -------------------  ----------------------------------------------------------------------------
    sys.platform         linux
    Python               3.8.11 (default, Aug  3 2021, 15:09:35) [GCC 7.5.0]
    numpy                1.21.0
    Pillow               8.3.1
    vissl                0.1.5 @/hissl/third_party/vissl/vissl
    GPU available        True
    GPU 0,1,2,3          TITAN RTX
    CUDA_HOME            /usr/local/cuda
    torchvision          0.9.2 @/users/hissl/miniconda3/lib/python3.8/site-packages/torchvision
    hydra                1.0.6 @/users/hissl/miniconda3/lib/python3.8/site-packages/hydra
    classy_vision        0.7.0.dev @/users/hissl/miniconda3/lib/python3.8/site-packages/classy_vision
    tensorboard          2.6.0
    apex                 unknown
    cv2                  4.5.3
    PyTorch              1.8.2 @/users/hissl/miniconda3/lib/python3.8/site-packages/torch
    PyTorch debug build  False
    -------------------  ----------------------------------------------------------------------------
    PyTorch built with:
    - GCC 7.3
    - C++ Version: 201402
    - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
    - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
    - OpenMP 201511 (a.k.a. OpenMP 4.5)
    - NNPACK is enabled
    - CPU capability usage: AVX2
    - CUDA Runtime 11.1
    - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
    - CuDNN 8.0.5
    - Magma 2.5.2
    - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

CPU info:


Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian CPU(s) 24 On-line CPU(s) list 0-23 Thread(s) per core 1 Core(s) per socket 12 Socket(s) 2 NUMA node(s) 4 Vendor ID GenuineIntel CPU family 6 Model 85 Model name Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz Stepping 4 CPU MHz 999.885 BogoMIPS 4600.00 Virtualization VT-x L1d cache 32K L1i cache 32K L2 cache 1024K L3 cache 16896K NUMA node0 CPU(s) 0,4,8,12,16,20 NUMA node1 CPU(s) 1,5,9,13,17,21 NUMA node2 CPU(s) 2,6,10,14,18,22 NUMA node3 CPU(s) 3,7,11,15,19,23


INFO 2021-12-20 16:15:53,827 trainer_main.py: 112: Using Distributed init method: tcp://localhost:38853, world_size: 1, rank: 0 INFO 2021-12-20 16:15:53,835 distributed_c10d.py: 187: Added key: store_based_barrier_key:1 to store for rank: 0 INFO 2021-12-20 16:15:53,835 trainer_main.py: 130: | initialized host r28n5.lisa.surfsara.nl as rank 0 (0) INFO 2021-12-20 16:16:07,923 train_task.py: 182: Not using Automatic Mixed Precision INFO 2021-12-20 16:16:07,926 ssl_dataset.py: 156: Rank: 0 split: TEST Data files: [''] INFO 2021-12-20 16:16:07,926 ssl_dataset.py: 159: Rank: 0 split: TEST Label files: [] INFO 2021-12-20 16:16:07,926 ssl_dataset.py: 156: Rank: 0 split: TRAIN Data files: [''] INFO 2021-12-20 16:16:07,926 ssl_dataset.py: 159: Rank: 0 split: TRAIN Label files: [] INFO 2021-12-20 16:16:07,926 misc.py: 141: Set start method of multiprocessing to forkserver INFO 2021-12-20 16:16:07,926 init.py: 130: Created the Distributed Sampler.... INFO 2021-12-20 16:16:07,926 init.py: 105: Distributed Sampler config: {'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 50000, 'total_size': 50000, 'shuffle': True, 'seed': 0} INFO 2021-12-20 16:16:07,926 init.py: 198: Prefetch factor is set to the default: 2 INFO 2021-12-20 16:16:07,927 init.py: 227: Wrapping the dataloader to async device copies INFO 2021-12-20 16:16:07,927 misc.py: 141: Set start method of multiprocessing to forkserver INFO 2021-12-20 16:16:07,927 init.py: 130: Created the Distributed Sampler.... INFO 2021-12-20 16:16:07,927 init.py: 105: Distributed Sampler config: {'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 50000, 'total_size': 50000, 'shuffle': True, 'seed': 0} INFO 2021-12-20 16:16:07,927 init.py: 198: Prefetch factor is set to the default: 2 INFO 2021-12-20 16:16:07,927 init.py: 227: Wrapping the dataloader to async device copies INFO 2021-12-20 16:16:07,927 train_task.py: 450: Building model.... INFO 2021-12-20 16:16:07,928 vision_transformer.py: 173: Building model: Vision Transformer from yaml config INFO 2021-12-20 16:16:08,671 train_task.py: 467: config.MODEL.FEATURE_EVAL_SETTINGS.FREEZE_TRUNK_ONLY=True, will freeze trunk... INFO 2021-12-20 16:16:08,671 base_ssl_model.py: 194: Freezing model trunk... INFO 2021-12-20 16:16:08,672 train_task.py: 424: Initializing model from: /hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch INFO 2021-12-20 16:16:08,672 util.py: 276: Attempting to load checkpoint from /hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch INFO 2021-12-20 16:16:09,304 util.py: 281: Loaded checkpoint from /hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch INFO 2021-12-20 16:16:09,304 util.py: 240: Broadcasting checkpoint loaded from /hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch INFO 2021-12-20 16:16:28,406 train_task.py: 430: Checkpoint loaded: /hissl-logs/train_nct_dino/checkpoints/8521997/model_phase40.torch... INFO 2021-12-20 16:16:28,408 checkpoint.py: 885: Loaded: trunk.class_token of shape: torch.Size([1, 1, 384]) from checkpoint INFO 2021-12-20 16:16:28,408 checkpoint.py: 885: Loaded: trunk.pos_embedding of shape: torch.Size([1, 197, 384]) from checkpoint INFO 2021-12-20 16:16:28,412 checkpoint.py: 885: Loaded: trunk.patch_embed.proj.weight of shape: torch.Size([384, 3, 16, 16]) from checkpoint INFO 2021-12-20 16:16:28,412 checkpoint.py: 885: Loaded: trunk.patch_embed.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,412 checkpoint.py: 885: Loaded: trunk.blocks.0.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,412 checkpoint.py: 885: Loaded: trunk.blocks.0.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,417 checkpoint.py: 885: Loaded: trunk.blocks.0.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,417 checkpoint.py: 885: Loaded: trunk.blocks.0.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,418 checkpoint.py: 885: Loaded: trunk.blocks.0.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,418 checkpoint.py: 885: Loaded: trunk.blocks.0.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,418 checkpoint.py: 885: Loaded: trunk.blocks.0.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,418 checkpoint.py: 885: Loaded: trunk.blocks.0.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,422 checkpoint.py: 885: Loaded: trunk.blocks.0.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,422 checkpoint.py: 885: Loaded: trunk.blocks.0.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,427 checkpoint.py: 885: Loaded: trunk.blocks.0.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,427 checkpoint.py: 885: Loaded: trunk.blocks.0.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,427 checkpoint.py: 885: Loaded: trunk.blocks.1.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,427 checkpoint.py: 885: Loaded: trunk.blocks.1.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,431 checkpoint.py: 885: Loaded: trunk.blocks.1.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,431 checkpoint.py: 885: Loaded: trunk.blocks.1.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,432 checkpoint.py: 885: Loaded: trunk.blocks.1.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,432 checkpoint.py: 885: Loaded: trunk.blocks.1.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,432 checkpoint.py: 885: Loaded: trunk.blocks.1.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,432 checkpoint.py: 885: Loaded: trunk.blocks.1.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,436 checkpoint.py: 885: Loaded: trunk.blocks.1.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,436 checkpoint.py: 885: Loaded: trunk.blocks.1.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,441 checkpoint.py: 885: Loaded: trunk.blocks.1.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,441 checkpoint.py: 885: Loaded: trunk.blocks.1.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,441 checkpoint.py: 885: Loaded: trunk.blocks.2.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,441 checkpoint.py: 885: Loaded: trunk.blocks.2.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,444 checkpoint.py: 885: Loaded: trunk.blocks.2.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,444 checkpoint.py: 885: Loaded: trunk.blocks.2.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,446 checkpoint.py: 885: Loaded: trunk.blocks.2.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,446 checkpoint.py: 885: Loaded: trunk.blocks.2.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,446 checkpoint.py: 885: Loaded: trunk.blocks.2.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,446 checkpoint.py: 885: Loaded: trunk.blocks.2.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,449 checkpoint.py: 885: Loaded: trunk.blocks.2.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,449 checkpoint.py: 885: Loaded: trunk.blocks.2.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,453 checkpoint.py: 885: Loaded: trunk.blocks.2.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,453 checkpoint.py: 885: Loaded: trunk.blocks.2.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,453 checkpoint.py: 885: Loaded: trunk.blocks.3.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,453 checkpoint.py: 885: Loaded: trunk.blocks.3.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,456 checkpoint.py: 885: Loaded: trunk.blocks.3.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,456 checkpoint.py: 885: Loaded: trunk.blocks.3.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,458 checkpoint.py: 885: Loaded: trunk.blocks.3.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,458 checkpoint.py: 885: Loaded: trunk.blocks.3.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,458 checkpoint.py: 885: Loaded: trunk.blocks.3.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,458 checkpoint.py: 885: Loaded: trunk.blocks.3.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,462 checkpoint.py: 885: Loaded: trunk.blocks.3.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,462 checkpoint.py: 885: Loaded: trunk.blocks.3.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,463 checkpoint.py: 885: Loaded: trunk.blocks.3.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,463 checkpoint.py: 885: Loaded: trunk.blocks.3.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,463 checkpoint.py: 885: Loaded: trunk.blocks.4.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,463 checkpoint.py: 885: Loaded: trunk.blocks.4.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,467 checkpoint.py: 885: Loaded: trunk.blocks.4.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,467 checkpoint.py: 885: Loaded: trunk.blocks.4.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,468 checkpoint.py: 885: Loaded: trunk.blocks.4.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,468 checkpoint.py: 885: Loaded: trunk.blocks.4.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,468 checkpoint.py: 885: Loaded: trunk.blocks.4.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,468 checkpoint.py: 885: Loaded: trunk.blocks.4.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,473 checkpoint.py: 885: Loaded: trunk.blocks.4.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,473 checkpoint.py: 885: Loaded: trunk.blocks.4.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,476 checkpoint.py: 885: Loaded: trunk.blocks.4.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,476 checkpoint.py: 885: Loaded: trunk.blocks.4.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,476 checkpoint.py: 885: Loaded: trunk.blocks.5.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,476 checkpoint.py: 885: Loaded: trunk.blocks.5.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,479 checkpoint.py: 885: Loaded: trunk.blocks.5.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,479 checkpoint.py: 885: Loaded: trunk.blocks.5.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,481 checkpoint.py: 885: Loaded: trunk.blocks.5.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,481 checkpoint.py: 885: Loaded: trunk.blocks.5.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,481 checkpoint.py: 885: Loaded: trunk.blocks.5.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,481 checkpoint.py: 885: Loaded: trunk.blocks.5.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,485 checkpoint.py: 885: Loaded: trunk.blocks.5.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,485 checkpoint.py: 885: Loaded: trunk.blocks.5.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,489 checkpoint.py: 885: Loaded: trunk.blocks.5.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,489 checkpoint.py: 885: Loaded: trunk.blocks.5.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,489 checkpoint.py: 885: Loaded: trunk.blocks.6.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,489 checkpoint.py: 885: Loaded: trunk.blocks.6.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,493 checkpoint.py: 885: Loaded: trunk.blocks.6.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,493 checkpoint.py: 885: Loaded: trunk.blocks.6.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,494 checkpoint.py: 885: Loaded: trunk.blocks.6.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,495 checkpoint.py: 885: Loaded: trunk.blocks.6.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,495 checkpoint.py: 885: Loaded: trunk.blocks.6.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,495 checkpoint.py: 885: Loaded: trunk.blocks.6.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,495 checkpoint.py: 885: Loaded: trunk.blocks.6.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,495 checkpoint.py: 885: Loaded: trunk.blocks.6.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,499 checkpoint.py: 885: Loaded: trunk.blocks.6.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,499 checkpoint.py: 885: Loaded: trunk.blocks.6.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,499 checkpoint.py: 885: Loaded: trunk.blocks.7.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,500 checkpoint.py: 885: Loaded: trunk.blocks.7.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,503 checkpoint.py: 885: Loaded: trunk.blocks.7.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,503 checkpoint.py: 885: Loaded: trunk.blocks.7.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,505 checkpoint.py: 885: Loaded: trunk.blocks.7.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,505 checkpoint.py: 885: Loaded: trunk.blocks.7.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,505 checkpoint.py: 885: Loaded: trunk.blocks.7.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,505 checkpoint.py: 885: Loaded: trunk.blocks.7.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,508 checkpoint.py: 885: Loaded: trunk.blocks.7.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,508 checkpoint.py: 885: Loaded: trunk.blocks.7.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,511 checkpoint.py: 885: Loaded: trunk.blocks.7.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,512 checkpoint.py: 885: Loaded: trunk.blocks.7.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,512 checkpoint.py: 885: Loaded: trunk.blocks.8.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,512 checkpoint.py: 885: Loaded: trunk.blocks.8.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,514 checkpoint.py: 885: Loaded: trunk.blocks.8.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,514 checkpoint.py: 885: Loaded: trunk.blocks.8.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,515 checkpoint.py: 885: Loaded: trunk.blocks.8.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,515 checkpoint.py: 885: Loaded: trunk.blocks.8.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,515 checkpoint.py: 885: Loaded: trunk.blocks.8.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,515 checkpoint.py: 885: Loaded: trunk.blocks.8.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,520 checkpoint.py: 885: Loaded: trunk.blocks.8.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,520 checkpoint.py: 885: Loaded: trunk.blocks.8.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,525 checkpoint.py: 885: Loaded: trunk.blocks.8.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,525 checkpoint.py: 885: Loaded: trunk.blocks.8.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,525 checkpoint.py: 885: Loaded: trunk.blocks.9.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,525 checkpoint.py: 885: Loaded: trunk.blocks.9.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,529 checkpoint.py: 885: Loaded: trunk.blocks.9.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,529 checkpoint.py: 885: Loaded: trunk.blocks.9.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,530 checkpoint.py: 885: Loaded: trunk.blocks.9.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,530 checkpoint.py: 885: Loaded: trunk.blocks.9.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,530 checkpoint.py: 885: Loaded: trunk.blocks.9.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,530 checkpoint.py: 885: Loaded: trunk.blocks.9.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,534 checkpoint.py: 885: Loaded: trunk.blocks.9.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,535 checkpoint.py: 885: Loaded: trunk.blocks.9.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,538 checkpoint.py: 885: Loaded: trunk.blocks.9.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,538 checkpoint.py: 885: Loaded: trunk.blocks.9.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,538 checkpoint.py: 885: Loaded: trunk.blocks.10.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,538 checkpoint.py: 885: Loaded: trunk.blocks.10.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,542 checkpoint.py: 885: Loaded: trunk.blocks.10.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,542 checkpoint.py: 885: Loaded: trunk.blocks.10.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,543 checkpoint.py: 885: Loaded: trunk.blocks.10.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,543 checkpoint.py: 885: Loaded: trunk.blocks.10.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,543 checkpoint.py: 885: Loaded: trunk.blocks.10.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,543 checkpoint.py: 885: Loaded: trunk.blocks.10.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,547 checkpoint.py: 885: Loaded: trunk.blocks.10.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,547 checkpoint.py: 885: Loaded: trunk.blocks.10.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,550 checkpoint.py: 885: Loaded: trunk.blocks.10.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,551 checkpoint.py: 885: Loaded: trunk.blocks.10.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,551 checkpoint.py: 885: Loaded: trunk.blocks.11.norm1.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,551 checkpoint.py: 885: Loaded: trunk.blocks.11.norm1.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,554 checkpoint.py: 885: Loaded: trunk.blocks.11.attn.qkv.weight of shape: torch.Size([1152, 384]) from checkpoint INFO 2021-12-20 16:16:28,554 checkpoint.py: 885: Loaded: trunk.blocks.11.attn.qkv.bias of shape: torch.Size([1152]) from checkpoint INFO 2021-12-20 16:16:28,555 checkpoint.py: 885: Loaded: trunk.blocks.11.attn.proj.weight of shape: torch.Size([384, 384]) from checkpoint INFO 2021-12-20 16:16:28,555 checkpoint.py: 885: Loaded: trunk.blocks.11.attn.proj.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,555 checkpoint.py: 885: Loaded: trunk.blocks.11.norm2.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,555 checkpoint.py: 885: Loaded: trunk.blocks.11.norm2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,560 checkpoint.py: 885: Loaded: trunk.blocks.11.mlp.fc1.weight of shape: torch.Size([1536, 384]) from checkpoint INFO 2021-12-20 16:16:28,560 checkpoint.py: 885: Loaded: trunk.blocks.11.mlp.fc1.bias of shape: torch.Size([1536]) from checkpoint INFO 2021-12-20 16:16:28,561 checkpoint.py: 885: Loaded: trunk.blocks.11.mlp.fc2.weight of shape: torch.Size([384, 1536]) from checkpoint INFO 2021-12-20 16:16:28,561 checkpoint.py: 885: Loaded: trunk.blocks.11.mlp.fc2.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,561 checkpoint.py: 885: Loaded: trunk.norm.weight of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,562 checkpoint.py: 885: Loaded: trunk.norm.bias of shape: torch.Size([384]) from checkpoint INFO 2021-12-20 16:16:28,562 checkpoint.py: 901: Extra layers not loaded from checkpoint: ['heads.0.projection_head.0.weight', 'heads.0.projection_head.0.bias', 'heads.0.projection_head.2.weight', 'heads.0.projection_head.2.bias', 'heads.0.projection_head.4.weight', 'heads.0.projection_head.4.bias', 'heads.0.prototypes0.weight_g', 'heads.0.prototypes0.weight_v'] INFO 2021-12-20 16:16:28,618 trainer_main.py: 342: Model is: Classy <class 'vissl.models.base_ssl_model.BaseSSLMultiInputOutputModel'>: BaseSSLMultiInputOutputModel( (_heads): ModuleDict() (trunk): VisionTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 384, kernel_size=(16, 16), stride=(16, 16)) ) (pos_drop): Dropout(p=0, inplace=False) (blocks): ModuleList( (0): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): Identity() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (1): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (2): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (3): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (4): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (5): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (6): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (7): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (8): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (9): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (10): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) (11): Block( (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=384, out_features=1152, bias=True) (attn_drop): Dropout(p=0, inplace=False) (proj): Linear(in_features=384, out_features=384, bias=True) (proj_drop): Dropout(p=0, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU() (fc2): Linear(in_features=1536, out_features=384, bias=True) (drop): Dropout(p=0, inplace=False) ) ) ) (norm): LayerNorm((384,), eps=1e-06, elementwise_affine=True) ) (heads): ModuleList() (dummy_layer): Linear(in_features=4, out_features=4, bias=True) ) INFO 2021-12-20 16:16:28,618 trainer_main.py: 352: ============== Split: TEST ======================= INFO 2021-12-20 16:16:28,618 trainer_main.py: 353: Extracting features for partition: test INFO 2021-12-20 16:16:31,114 trainer_main.py: 414: Model set to eval mode during feature extraction... Traceback (most recent call last): File "tools/run_distributed_engines.py", line 58, in hydra_main(overrides=overrides) File "tools/run_distributed_engines.py", line 41, in hydra_main launch_distributed( File "/hissl/third_party/vissl/vissl/utils/distributed_launcher.py", line 150, in launch_distributed _distributed_worker( File "/hissl/third_party/vissl/vissl/utils/distributed_launcher.py", line 192, in _distributed_worker run_engine( File "/hissl/third_party/vissl/vissl/engines/engine_registry.py", line 86, in run_engine engine.run_engine( File "/hissl/third_party/vissl/vissl/engines/extract_features.py", line 39, in run_engine extract_main( File "/hissl/third_party/vissl/vissl/engines/extract_features.py", line 106, in extract_main trainer.extract(output_folder=cfg.EXTRACT_FEATURES.OUTPUT_DIR or checkpoint_folder) File "/hissl/third_party/vissl/vissl/trainer/trainer_main.py", line 355, in extract self._extract_split_features(feat_names, self.task, split, output_folder) File "/hissl/third_party/vissl/vissl/trainer/trainer_main.py", line 435, in _extract_split_features flat_features_list = self._flatten_features_list(features) File "/hissl/third_party/vissl/vissl/trainer/trainer_main.py", line 363, in _flatten_features_list assert isinstance(features, list), "features must be of type list" AssertionError: features must be of type list


## Expected behavior:

I would expect the script to extract features from trunk output of the pretrained model, something like this: https://vissl.readthedocs.io/en/v0.1.6/evaluations/feature_extraction.html#extract-features-of-the-trunk-output but instead im getting an AssertionError

If you expect the model to converge / work better, note that we do not give suggestions
on how to train a new model.
Only in one of the two conditions, we will help with it:
(1) You're unable to reproduce the results in vissl model zoo.
(2) It indicates a vissl bug.

## Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/vissl/raw/main/vissl/utils/collect_env.py && python collect_env.py


If your issue looks like an installation issue / environment issue,
please first try to solve it with the instructions in
https://github.com/facebookresearch/vissl/tree/main/docs

## Additional information aka "what I found out so far"
From the assertion error its clear that the script expects the `features` variable to be a list. As far as I understand, if we perform [feature extraction on multiple layers of the trunk](https://vissl.readthedocs.io/en/v0.1.6/evaluations/feature_extraction.html#extract-features-from-several-layers-of-the-trunk), the output is a list of tensors where every tensor corresponds to one specified layer. And I want to get features from a single layer instead of multiple layers, so `features` should be a list containing one tensor. Instead, during debugging, I found out that `features` in my case are a Tensor of shape (1, 64, 384) where 64 is the batch size and 384 is hidden size of the feature vectors. After some digging, the first thing that seems off is the following. During initialization of `BaseSSLMultiInputOutputModel` the method [_get_trunk()](https://github.com/facebookresearch/vissl/blob/484cdecd1a71cb457d8ea74942603b907a23d39d/vissl/models/base_ssl_model.py#L240) is called. This method consists of an if statement:

if is_feature_extractor_model(self.model_config): self.eval_mode = True return FeatureExtractorModel(self.model_config) else: self.eval_mode = False trunk_name = self.model_config.TRUNK.NAME return get_model_trunk(trunk_name)(self.model_config, trunk_name)

You can see the method [is_feature_extractor_mode](https://github.com/facebookresearch/vissl/blob/aa3f7cc33b3b7806e15593083aedc383d85e4a53/vissl/models/model_helpers.py#L52) which looks like so:

return ( model_config.FEATURE_EVAL_SETTINGS.EVAL_MODE_ON and model_config.FEATURE_EVAL_SETTINGS.FREEZE_TRUNK_ONLY and len(model_config.FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP) > 0 )


the important part is the last line. Following the [docs](https://vissl.readthedocs.io/en/v0.1.6/evaluations/feature_extraction.html#extract-features-of-the-trunk-output) I do not specify FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP in my config file, so it defaults to an empty list. Therefore, `len(model_config.FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP) > 0` is False and so is `is_feature_extractor_model()`.  This results in returning `get_model_trunk()` instead of `FeatureExtractorModel()`. The former returns a model that on forward pass returns a tensor (while `FeatureExtractorModel` returns a list of tensors) which leads to the AssertionError. So all this would indicate that in contrast to the documentation, we always need to provide the `model_config.FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP` argument, therefore afaik this part of docs need to be updated. I can see an example how to do it for trunk only for resnet architecture [here](https://github.com/facebookresearch/vissl/blob/main/configs/config/feature_extraction/trunk_only/rn50_res5.yaml) (by using `Identity`), but I'm not sure which layer to specify for vision transformer, could you help me out with that?
iseessel commented 2 years ago

Hey @blazejdolicki, Have you looked at this tutorial and cross-referenced your config with it? You should have LINEAR_EVAL_FEAT_POOL_OPS_MAP to be a list of the features that you want to extract. Now I'm not sure these options are supported for ViT.

One work around you could try is https://vissl.ai/tutorials/Feature_Extraction_V0_1_6#Extract-the-Output-of-the-Model-Head extracting the head features and either: 1) Creating an identity head. 2) Not setting the head in the model. You should also be able to hack around https://github.com/facebookresearch/vissl/blob/main/vissl/models/trunks/feature_extractor.py to save what you need -- I'd recommend taking a debugger and walking through this code.

blazejdolicki commented 2 years ago

Thanks for your reply. Here's what I did. In this config file the Identity module is used to return the input without changing it.

LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
        ["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
        ["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
        ["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
        ["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
        ["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
        ["res5avg", ["Identity", []]],
      ]

I suppose "res5avg" is the last layer in the CNN trunk. So I tried to replicate it for transformers, where the last layer in the trunk is norm by adding the following to my config:

LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
        ["norm", ["Identity", []]],
      ]

Do you think this solution will return correct features? Running with this config does not lead to any errors, the number of returned features corresponds to the number of images in the supplied dataset and the shape of the features are correct. But I'm still thinking how can I verify that the values of the features are correct. So far the only way to confirm that which I came up with is to load the model in plain PyTorch and see if its returned features match with those returned by VISSL. Is there a better way to do it?

iseessel commented 2 years ago

That sounds like a good plan to me!

You could also step through the vissl code with a debugger to make sure it's returning the right thing. I will try to validate quickly later this as well.

If you wanted to contribute a config in a PR with these options so we have this use-case documented that would be amazing!

blazejdolicki commented 2 years ago

Hi @iseessel, took me some time but I added a pull request with my config. Just signed the CLA, so that should be updated soon. Thanks for all the help!