Closed mcwindy closed 1 year ago
Hi @mcwindy,
First of all, thanks a lot for considering VISSL :)
I had a quick look at the configuration you linked (the replacement for supervised_1gpu_resnet_example.yaml
you linked) and found things that might explain the issue. For instance:
moco_loss
moco_collator
: the configuration is missing ImgReplicatePil
to have at least 2 views per image sampled from the datasetBut before you proceed with those changes, please consider using the configuration there as a better starting point for MOCO experimentations: configs/config/pretrain/moco/moco_1node_resnet.yaml
This configuration has everything set-up for MOCO (augmentations, projection, loss, etc) and you could instead start from it to avoid having to deal with configuration issues.
Please tell me if that works for you, Quentin
Instructions To Reproduce the π Bug:
git diff
) or what code you wroteModified the main function in run_distributed_engines.py to
Modified supervised_1gpu_resnet_example as follows:
what exact command you run:
what you observed (including full logs):
/home/mcwindy/.local/lib/python3.9/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in 0.14. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /home/mcwindy/.local/lib/python3.9/site-packages/torchvision/transforms/_transforms_video.py:25: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in 0.14. Please use the 'torchvision.transforms' module instead. warnings.warn( ####### overrides: ['hydra.verbose=true', 'config=supervised_1gpu_resnet_example', 'config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=True', 'config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TEST.DATASET_NAMES=[dummy_data_folder]', 'hydra.verbose=true'] INFO 2022-03-29 23:39:29,052 init.py: 37: Provided Config has latest version: 1 INFO 2022-03-29 23:39:29,053 io.py: 63: Saving data to file: checkpoints1/train_config.yaml INFO 2022-03-29 23:39:29,072 io.py: 89: Saved data to file: checkpoints1/train_config.yaml INFO 2022-03-29 23:39:29,072 run_distributed_engines.py: 162: Spawning process for node_id: 0, local_rank: 0, dist_rank: 0, dist_run_id: localhost:50653 INFO 2022-03-29 23:39:29,072 train.py: 94: Env set for rank: 0, dist_rank: 0 INFO 2022-03-29 23:39:29,073 env.py: 50: ALL_PROXY: INFO 2022-03-29 23:39:29,073 env.py: 50: COLORTERM: truecolor INFO 2022-03-29 23:39:29,073 env.py: 50: CPLUS_INCLUDE_PATH: /usr/local/include/python3.8/ INFO 2022-03-29 23:39:29,073 env.py: 50: CUDA_PATH: /usr/local/cuda-11.5/targets/x86_64-linux/include/ INFO 2022-03-29 23:39:29,073 env.py: 50: C_INCLUDE_PATH: /usr/local/include/python3.8/ INFO 2022-03-29 23:39:29,073 env.py: 50: DISPLAY: :0 INFO 2022-03-29 23:39:29,073 env.py: 50: GIT_ASKPASS: /home/mcwindy/.vscode-server/bin/c722ca6c7eed3d7987c0d5c3df5c45f6b15e77d1/extensions/git/dist/askpass.sh INFO 2022-03-29 23:39:29,073 env.py: 50: HOME: /home/mcwindy INFO 2022-03-29 23:39:29,073 env.py: 50: HOSTTYPE: x86_64 INFO 2022-03-29 23:39:29,073 env.py: 50: HTTPS_proxy: http://172.28.0.1:7890 INFO 2022-03-29 23:39:29,073 env.py: 50: HTTP_PROXY: http://172.28.0.1:7890 INFO 2022-03-29 23:39:29,073 env.py: 50: LANG: C.UTF-8 INFO 2022-03-29 23:39:29,073 env.py: 50: LESS: -R INFO 2022-03-29 23:39:29,073 env.py: 50: LOCAL_RANK: 0 INFO 2022-03-29 23:39:29,073 env.py: 50: LOGNAME: mcwindy INFO 2022-03-29 23:39:29,073 env.py: 50: LSCOLORS: Gxfxcxdxbxegedabagacad INFO 2022-03-29 23:39:29,074 env.py: 50: LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36: INFO 2022-03-29 23:39:29,074 env.py: 50: NAME: mcwindy_pc INFO 2022-03-29 23:39:29,074 env.py: 50: OLDPWD: /home/mcwindy INFO 2022-03-29 23:39:29,074 env.py: 50: PAGER: less INFO 2022-03-29 23:39:29,074 env.py: 50: PATH: /home/mcwindy/.vscode-server/bin/c722ca6c7eed3d7987c0d5c3df5c45f6b15e77d1/bin/remote-cli:/home/mcwindy/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files (x86)/VMware/VMware Workstation/bin/:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem/:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/ProgramData/chocolatey/bin/:/mnt/c/tools/adb/:/mnt/c/tools/minio/:/mnt/c/Users/mcwindy/.cargo/bin/:/mnt/c/Users/mcwindy/.jdks/openjdk-17.0.1/bin/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python38/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python310/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Microsoft VS Code/bin/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python38/Lib/site-packages/torch/lib/:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/Git/cmd/:/mnt/c/Program Files/WireGuard/:/mnt/c/ProgramData/DockerDesktop/version-bin/:/mnt/c/Program Files/Docker/Docker/resources/bin/:/mnt/c/Program Files/Oculus/Support/oculus-runtime/:/mnt/c/Program Files/Common Files/Oracle/Java/javapath/:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR/:/mnt/c/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.31.31103/bin/Hostx64/x64/:/mnt/c/Program Files (x86)/Common Files/Oracle/Java/javapath/:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common/:/mnt/c/Users/mcwindy/Desktop/videos/:/mnt/c/tools/ffmpeg 5.0/bin/:/mnt/c/tools/TDM-GCC/bin:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/nodejs/:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/ProgramData/DockerDesktop/version-bin:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python38/Scripts/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python38/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python310/Scripts/:/mnt/c/Users/mcwindy/AppData/Local/Programs/Python/Python310/:/mnt/c/Users/mcwindy/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/mcwindy/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/mcwindy/.dotnet/tools:/mnt/c/Program Files/JetBrains/IntelliJ IDEA/bin:/mnt/c/Program Files/JetBrains/PyCharm/bin:/mnt/c/Users/mcwindy/AppData/Local/Programs/Fiddler:/mnt/c/Users/mcwindy/.dotnet/tools:/mnt/c/Users/mcwindy/AppData/Local/Programs/oh-my-posh/bin:/mnt/c/Users/mcwindy/AppData/Roaming/npm INFO 2022-03-29 23:39:29,074 env.py: 50: PULSE_SERVER: /mnt/wslg/PulseServer INFO 2022-03-29 23:39:29,074 env.py: 50: PWD: /home/mcwindy/vissltest INFO 2022-03-29 23:39:29,074 env.py: 50: RANK: 0 INFO 2022-03-29 23:39:29,074 env.py: 50: SHELL: /bin/zsh INFO 2022-03-29 23:39:29,074 env.py: 50: SHLVL: 1 INFO 2022-03-29 23:39:29,074 env.py: 50: TERM: xterm-256color INFO 2022-03-29 23:39:29,074 env.py: 50: TERM_PROGRAM: vscode INFO 2022-03-29 23:39:29,074 env.py: 50: TERM_PROGRAM_VERSION: 1.65.2 INFO 2022-03-29 23:39:29,074 env.py: 50: USER: mcwindy INFO 2022-03-29 23:39:29,074 env.py: 50: VSCODE_GIT_ASKPASS_EXTRA_ARGS: INFO 2022-03-29 23:39:29,074 env.py: 50: VSCODE_GIT_ASKPASS_MAIN: /home/mcwindy/.vscode-server/bin/c722ca6c7eed3d7987c0d5c3df5c45f6b15e77d1/extensions/git/dist/askpass-main.js INFO 2022-03-29 23:39:29,075 env.py: 50: VSCODE_GIT_ASKPASS_NODE: /home/mcwindy/.vscode-server/bin/c722ca6c7eed3d7987c0d5c3df5c45f6b15e77d1/node INFO 2022-03-29 23:39:29,075 env.py: 50: VSCODE_GIT_IPC_HANDLE: /mnt/wslg/runtime-dir/vscode-git-853611fa42.sock INFO 2022-03-29 23:39:29,075 env.py: 50: VSCODE_IPC_HOOK_CLI: /mnt/wslg/runtime-dir/vscode-ipc-244fb306-5360-490b-b817-3e6d21c01b48.sock INFO 2022-03-29 23:39:29,075 env.py: 50: WAYLAND_DISPLAY: wayland-0 INFO 2022-03-29 23:39:29,075 env.py: 50: WORLD_SIZE: 1 INFO 2022-03-29 23:39:29,075 env.py: 50: WSLENV: VSCODE_WSL_EXT_LOCATION/up INFO 2022-03-29 23:39:29,075 env.py: 50: WSL_DISTRO_NAME: Ubuntu INFO 2022-03-29 23:39:29,075 env.py: 50: WSL_INTEROP: /run/WSL/11_interop INFO 2022-03-29 23:39:29,075 env.py: 50: XDG_RUNTIMEDIR: /mnt/wslg/runtime-dir INFO 2022-03-29 23:39:29,075 env.py: 50: ZSH: /home/mcwindy/.oh-my-zsh INFO 2022-03-29 23:39:29,075 env.py: 50: : /usr/bin/python3 INFO 2022-03-29 23:39:29,075 env.py: 50: all_proxy: INFO 2022-03-29 23:39:29,075 env.py: 50: http_proxy: http://172.28.0.1:7890 INFO 2022-03-29 23:39:29,075 env.py: 50: https_proxy: http://172.28.0.1:7890 INFO 2022-03-29 23:39:29,075 misc.py: 161: Set start method of multiprocessing to fork INFO 2022-03-29 23:39:29,075 train.py: 105: Setting seed.... INFO 2022-03-29 23:39:29,076 misc.py: 173: MACHINE SEED: 0 INFO 2022-03-29 23:39:29,274 hydra_config.py: 132: Training with config: INFO 2022-03-29 23:39:29,278 hydra_config.py: 141: {'CHECKPOINT': {'APPEND_DISTR_RUN_ID': False, 'AUTO_RESUME': True, 'BACKEND': 'disk', 'CHECKPOINT_FREQUENCY': 1, 'CHECKPOINT_ITER_FREQUENCY': -1, 'DIR': 'checkpoints1', 'LATEST_CHECKPOINT_RESUME_FILE_NUM': 1, 'OVERWRITE_EXISTING': False, 'USE_SYMLINK_CHECKPOINT_FOR_RESUME': False}, 'CLUSTERFIT': {'CLUSTER_BACKEND': 'faiss', 'DATA_LIMIT': -1, 'DATA_LIMIT_SAMPLING': {'SEED': 0}, 'FEATURES': {'DATASET_NAME': '', 'DATA_PARTITION': 'TRAIN', 'DIMENSIONALITY_REDUCTION': 0, 'EXTRACT': False, 'LAYER_NAME': '', 'PATH': '.', 'TEST_PARTITION': 'TEST'}, 'NUM_CLUSTERS': 16000, 'NUM_ITER': 50, 'OUTPUT_DIR': '.'}, 'DATA': {'DDP_BUCKET_CAP_MB': 25, 'ENABLE_ASYNC_GPU_COPY': True, 'NUM_DATALOADER_WORKERS': 5, 'PIN_MEMORY': True, 'TEST': {'BASE_DATASET': 'generic_ssl', 'BATCHSIZE_PER_REPLICA': 32, 'COLLATE_FUNCTION': 'default_collate', 'COLLATE_FUNCTION_PARAMS': {}, 'COPY_DESTINATION_DIR': '', 'COPY_TO_LOCAL_DISK': False, 'DATASET_NAMES': ['dummy_data_folder'], 'DATA_LIMIT': -1, 'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False, 'SEED': 0, 'SKIP_NUM_SAMPLES': 0}, 'DATA_PATHS': [], 'DATA_SOURCES': ['disk_folder'], 'DEFAULT_GRAY_IMG_SIZE': 224, 'DROP_LAST': False, 'ENABLE_QUEUE_DATASET': False, 'INPUT_KEY_NAMES': ['data'], 'LABEL_PATHS': [], 'LABEL_SOURCES': ['disk_folder'], 'LABEL_TYPE': 'standard', 'MMAP_MODE': True, 'NEW_IMG_PATH_PREFIX': '', 'RANDOM_SYNTHETIC_IMAGES': False, 'REMOVE_IMG_PATH_PREFIX': '', 'TARGET_KEY_NAMES': ['label'], 'TRANSFORMS': [{'name': 'Resize', 'size': 256}, {'name': 'CenterCrop', 'size': 224}, {'name': 'ToTensor'}, {'mean': [0.485, 0.456, 0.406], 'name': 'Normalize', 'std': [0.229, 0.224, 0.225]}], 'USE_DEBUGGING_SAMPLER': False, 'USE_STATEFUL_DISTRIBUTED_SAMPLER': False}, 'TRAIN': {'BASE_DATASET': 'generic_ssl', 'BATCHSIZE_PER_REPLICA': 32, 'COLLATE_FUNCTION': 'moco_collator', 'COLLATE_FUNCTION_PARAMS': {}, 'COPY_DESTINATION_DIR': '', 'COPY_TO_LOCAL_DISK': False, 'DATASET_NAMES': ['dummy_data_folder'], 'DATA_LIMIT': -1, 'DATA_LIMIT_SAMPLING': {'IS_BALANCED': False, 'SEED': 0, 'SKIP_NUM_SAMPLES': 0}, 'DATA_PATHS': [], 'DATA_SOURCES': ['disk_folder'], 'DEFAULT_GRAY_IMG_SIZE': 224, 'DROP_LAST': False, 'ENABLE_QUEUE_DATASET': False, 'INPUT_KEY_NAMES': ['data'], 'LABEL_PATHS': [], 'LABEL_SOURCES': ['disk_folder'], 'LABEL_TYPE': 'standard', 'MMAP_MODE': True, 'NEW_IMG_PATH_PREFIX': '', 'RANDOM_SYNTHETIC_IMAGES': False, 'REMOVE_IMG_PATH_PREFIX': '', 'TARGET_KEY_NAMES': ['label'], 'TRANSFORMS': [{'name': 'RandomResizedCrop', 'size': 224}, {'name': 'RandomHorizontalFlip'}, {'brightness': 0.4, 'contrast': 0.4, 'hue': 0.4, 'name': 'ColorJitter', 'saturation': 0.4}, {'name': 'ToTensor'}, {'mean': [0.485, 0.456, 0.406], 'name': 'Normalize', 'std': [0.229, 0.224, 0.225]}], 'USE_DEBUGGING_SAMPLER': False, 'USE_STATEFUL_DISTRIBUTED_SAMPLER': False}}, 'DISTRIBUTED': {'BACKEND': 'nccl', 'BROADCAST_BUFFERS': True, 'INIT_METHOD': 'tcp', 'MANUAL_GRADIENT_REDUCTION': False, 'NCCL_DEBUG': False, 'NCCL_SOCKET_NTHREADS': '', 'NUM_NODES': 1, 'NUM_PROC_PER_NODE': 1, 'RUN_ID': 'auto'}, 'EXTRACT_FEATURES': {'CHUNK_THRESHOLD': 0, 'OUTPUT_DIR': ''}, 'HOOKS': {'CHECK_NAN': True, 'LOG_GPU_STATS': True, 'MEMORY_SUMMARY': {'DUMP_MEMORY_ON_EXCEPTION': False, 'LOG_ITERATION_NUM': 0, 'PRINT_MEMORY_SUMMARY': True}, 'MODEL_COMPLEXITY': {'COMPUTE_COMPLEXITY': False, 'INPUT_SHAPE': [3, 224, 224]}, 'PERF_STATS': {'MONITOR_PERF_STATS': False, 'PERF_STAT_FREQUENCY': -1, 'ROLLING_BTIME_FREQ': -1}, 'TENSORBOARD_SETUP': {'EXPERIMENT_LOG_DIR': 'tensorboard', 'FLUSH_EVERY_N_MIN': 5, 'LOG_DIR': '.', 'LOG_PARAMS': True, 'LOG_PARAMS_EVERY_N_ITERS': 310, 'LOG_PARAMS_GRADIENTS': True, 'USE_TENSORBOARD': True}}, 'IMG_RETRIEVAL': {'CROP_QUERY_ROI': False, 'DATASET_PATH': '', 'DEBUG_MODE': False, 'EVAL_BINARY_PATH': '', 'EVAL_DATASET_NAME': 'Paris', 'FEATS_PROCESSING_TYPE': '', 'GEM_POOL_POWER': 4.0, 'IMG_SCALINGS': [1], 'NORMALIZE_FEATURES': True, 'NUM_DATABASE_SAMPLES': -1, 'NUM_QUERY_SAMPLES': -1, 'NUM_TRAINING_SAMPLES': -1, 'N_PCA': 512, 'RESIZE_IMG': 1024, 'SAVE_FEATURES': False, 'SAVE_RETRIEVAL_RANKINGS_SCORES': True, 'SIMILARITY_MEASURE': 'cosine_similarity', 'SPATIAL_LEVELS': 3, 'TRAIN_DATASET_NAME': 'Oxford', 'TRAIN_PCA_WHITENING': True, 'USE_DISTRACTORS': False, 'WHITEN_IMG_LIST': ''}, 'LOG_FREQUENCY': 100, 'LOSS': {'CrossEntropyLoss': {'ignore_index': -1}, 'barlow_twins_loss': {'embeddingdim': 8192, 'lambda': 0.0051, 'scale_loss': 0.024}, 'bce_logits_multiple_output_single_target': {'normalize_output': False, 'reduction': 'none', 'world_size': 1}, 'cross_entropy_multiple_output_single_target': {'ignore_index': -1, 'normalize_output': False, 'reduction': 'mean', 'temperature': 1.0, 'weight': None}, 'deepclusterv2_loss': {'BATCHSIZE_PER_REPLICA': 256, 'DROP_LAST': True, 'kmeans_iters': 10, 'memory_params': {'crops_for_mb': [0], 'embedding_dim': 128}, 'num_clusters': [3000, 3000, 3000], 'num_crops': 2, 'num_train_samples': -1, 'temperature': 0.1}, 'dino_loss': {'crops_for_teacher': [0, 1], 'ema_center': 0.9, 'momentum': 0.996, 'normalize_last_layer': True, 'output_dim': 65536, 'student_temp': 0.1, 'teacher_temp_max': 0.07, 'teacher_temp_min': 0.04, 'teacher_temp_warmup_iters': 37500}, 'moco_loss': {'embedding_dim': 128, 'momentum': 0.999, 'queue_size': 65536, 'temperature': 0.2}, 'multicrop_simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096, 'embedding_dim': 128, 'world_size': 64}, 'num_crops': 2, 'temperature': 0.1}, 'name': 'moco_loss', 'nce_loss_with_memory': {'loss_type': 'nce', 'loss_weights': [1.0], 'memory_params': {'embedding_dim': 128, 'memory_size': -1, 'momentum': 0.5, 'norm_init': True, 'update_mem_on_forward': True}, 'negative_sampling_params': {'num_negatives': 16000, 'type': 'random'}, 'norm_constant': -1, 'norm_embedding': True, 'num_train_samples': -1, 'temperature': 0.07, 'update_mem_with_emb_index': -100}, 'simclr_info_nce_loss': {'buffer_params': {'effective_batch_size': 4096, 'embedding_dim': 128, 'world_size': 64}, 'temperature': 0.1}, 'swav_loss': {'crops_for_assign': [0, 1], 'embedding_dim': 128, 'epsilon': 0.05, 'normalize_last_layer': True, 'num_crops': 2, 'num_iters': 3, 'num_prototypes': [3000], 'output_dir': '.', 'queue': {'local_queue_length': 0, 'queue_length': 0, 'start_iter': 0}, 'temp_hard_assignment_iters': 0, 'temperature': 0.1, 'use_double_precision': False}, 'swav_momentum_loss': {'crops_for_assign': [0, 1], 'embedding_dim': 128, 'epsilon': 0.05, 'momentum': 0.99, 'momentum_eval_mode_iter_start': 0, 'normalize_last_layer': True, 'num_crops': 2, 'num_iters': 3, 'num_prototypes': [3000], 'queue': {'local_queue_length': 0, 'queue_length': 0, 'start_iter': 0}, 'temperature': 0.1, 'use_double_precision': False}}, 'MACHINE': {'DEVICE': 'gpu'}, 'METERS': {'accuracy_list_meter': {'meter_names': [], 'num_meters': 1, 'topk_values': [1, 5]}, 'enable_training_meter': True, 'mean_ap_list_meter': {'max_cpu_capacity': -1, 'meter_names': [], 'num_classes': 9605, 'num_meters': 1}, 'model_output_mask': False, 'name': 'accuracy_list_meter', 'names': ['accuracy_list_meter'], 'precision_at_k_list_meter': {'meter_names': [], 'num_meters': 1, 'topk_values': [1]}, 'recall_at_k_list_meter': {'meter_names': [], 'num_meters': 1, 'topk_values': [1]}}, 'MODEL': {'ACTIVATION_CHECKPOINTING': {'NUM_ACTIVATION_CHECKPOINTING_SPLITS': 2, 'USE_ACTIVATION_CHECKPOINTING': False}, 'AMP_PARAMS': {'AMP_ARGS': {'opt_level': 'O1'}, 'AMP_TYPE': 'apex', 'USE_AMP': False}, 'BASE_MODEL_NAME': 'multi_input_output_model', 'CUDA_CACHE': {'CLEAR_CUDA_CACHE': False, 'CLEAR_FREQ': 100}, 'FEATURE_EVAL_SETTINGS': {'EVAL_MODE_ON': False, 'EVAL_TRUNK_AND_HEAD': False, 'EXTRACT_TRUNK_FEATURES_ONLY': False, 'FREEZE_TRUNK_AND_HEAD': False, 'FREEZE_TRUNK_ONLY': False, 'LINEAR_EVAL_FEAT_POOL_OPS_MAP': [], 'SHOULD_FLATTEN_FEATS': True}, 'FSDP_CONFIG': {'AUTO_WRAP_THRESHOLD': 0, 'bucket_cap_mb': 0, 'clear_autocast_cache': True, 'compute_dtype': torch.float32, 'flatten_parameters': True, 'fp32_reduce_scatter': False, 'mixed_precision': True, 'verbose': True}, 'GRAD_CLIP': {'MAX_NORM': 1, 'NORM_TYPE': 2, 'USE_GRAD_CLIP': False}, 'HEAD': {'BATCHNORM_EPS': 1e-05, 'BATCHNORM_MOMENTUM': 0.1, 'PARAMS': [['mlp', {'dims': [2048, 1000]}]], 'PARAMS_MULTIPLIER': 1.0}, 'INPUT_TYPE': 'rgb', 'MULTI_INPUT_HEAD_MAPPING': [], 'NON_TRAINABLE_PARAMS': [], 'SHARDED_DDP_SETUP': {'USE_SDP': False, 'reduce_buffer_size': -1}, 'SINGLE_PASS_EVERY_CROP': False, 'SYNC_BN_CONFIG': {'CONVERT_BN_TO_SYNC_BN': False, 'GROUP_SIZE': -1, 'SYNC_BN_TYPE': 'pytorch'}, 'TEMP_FROZEN_PARAMS_ITER_MAP': [], 'TRUNK': {'CONVIT': {'CLASS_TOKEN_IN_LOCAL_LAYERS': False, 'LOCALITY_DIM': 10, 'LOCALITY_STRENGTH': 1.0, 'N_GPSA_LAYERS': 10, 'USE_LOCAL_INIT': True}, 'EFFICIENT_NETS': {}, 'NAME': 'resnet', 'REGNET': {}, 'RESNETS': {'DEPTH': 50, 'GROUPNORM_GROUPS': 32, 'GROUPS': 1, 'LAYER4_STRIDE': 2, 'NORM': 'BatchNorm', 'STANDARDIZE_CONVOLUTIONS': False, 'WIDTH_MULTIPLIER': 1, 'WIDTH_PER_GROUP': 64, 'ZERO_INIT_RESIDUAL': False}, 'TRUNK_PARAMS': {'RESNETS': {'DEPTH': 50}}, 'VISION_TRANSFORMERS': {'ATTENTION_DROPOUT_RATE': 0, 'CLASSIFIER': 'token', 'DROPOUT_RATE': 0, 'DROP_PATH_RATE': 0, 'HIDDEN_DIM': 768, 'IMAGE_SIZE': 224, 'MLP_DIM': 3072, 'NUM_HEADS': 12, 'NUM_LAYERS': 12, 'PATCH_SIZE': 16, 'QKV_BIAS': False, 'QK_SCALE': False, 'name': None}, 'XCIT': {'ATTENTION_DROPOUT_RATE': 0, 'DROPOUT_RATE': 0, 'DROP_PATH_RATE': 0.05, 'ETA': 1, 'HIDDEN_DIM': 384, 'IMAGE_SIZE': 224, 'NUM_HEADS': 8, 'NUM_LAYERS': 12, 'PATCH_SIZE': 16, 'QKV_BIAS': True, 'QK_SCALE': False, 'TOKENS_NORM': True, 'name': None}}, 'WEIGHTS_INIT': {'APPEND_PREFIX': '', 'PARAMS_FILE': '', 'REMOVE_PREFIX': '', 'SKIP_LAYERS': ['num_batches_tracked'], 'STATE_DICT_KEY_NAME': 'classy_state_dict'}, '_MODEL_INIT_SEED': 0}, 'MONITORING': {'MONITOR_ACTIVATION_STATISTICS': 0}, 'MULTI_PROCESSING_METHOD': 'fork', 'NEAREST_NEIGHBOR': {'L2_NORM_FEATS': False, 'SIGMA': 0.1, 'TOPK': 200}, 'OPTIMIZER': {'betas': [0.9, 0.999], 'construct_single_param_group_only': False, 'head_optimizer_params': {'use_different_lr': False, 'use_different_wd': False, 'weight_decay': 0.0001}, 'larc_config': {'clip': False, 'eps': 1e-08, 'trust_coefficient': 0.001}, 'momentum': 0.9, 'name': 'sgd', 'nesterov': True, 'non_regularized_parameters': [], 'num_epochs': 105, 'param_schedulers': {'lr': {'auto_lr_scaling': {'auto_scale': True, 'base_lr_batch_size': 256, 'base_value': 0.1, 'scaling_type': 'linear'}, 'end_value': 0.0, 'interval_scaling': [], 'lengths': [], 'milestones': [30, 60, 90, 100], 'name': 'multistep', 'schedulers': [], 'start_value': 0.1, 'update_interval': 'epoch', 'value': 0.1, 'values': [0.0125, 0.00125, 0.000125, 1.25e-05, 1.25e-06]}, 'lr_head': {'auto_lr_scaling': {'auto_scale': True, 'base_lr_batch_size': 256, 'base_value': 0.1, 'scaling_type': 'linear'}, 'end_value': 0.0, 'interval_scaling': [], 'lengths': [], 'milestones': [30, 60, 90, 100], 'name': 'multistep', 'schedulers': [], 'start_value': 0.1, 'update_interval': 'epoch', 'value': 0.1, 'values': [0.0125, 0.00125, 0.000125, 1.25e-05, 1.25e-06]}}, 'regularize_bias': True, 'regularize_bn': False, 'use_larc': False, 'use_zero': False, 'weight_decay': 0.0001}, 'PROFILING': {'MEMORY_PROFILING': {'TRACK_BY_LAYER_MEMORY': False}, 'NUM_ITERATIONS': 10, 'OUTPUT_FOLDER': '.', 'PROFILED_RANKS': [0, 1], 'RUNTIME_PROFILING': {'LEGACY_PROFILER': False, 'PROFILE_CPU': True, 'PROFILE_GPU': True, 'USE_PROFILER': False}, 'START_ITERATION': 0, 'STOP_TRAINING_AFTER_PROFILING': False, 'WARMUP_ITERATIONS': 0}, 'REPRODUCIBILITY': {'CUDDN_DETERMINISTIC': False}, 'SEED_VALUE': 0, 'SLURM': {'ADDITIONAL_PARAMETERS': {}, 'COMMENT': 'vissl job', 'CONSTRAINT': '', 'LOG_FOLDER': '.', 'MEM_GB': 250, 'NAME': 'vissl', 'NUM_CPU_PER_PROC': 8, 'PARTITION': '', 'PORT_ID': 40050, 'TIME_HOURS': 72, 'TIME_MINUTES': 0, 'USE_SLURM': False}, 'SVM': {'cls_list': [], 'costs': {'base': -1.0, 'costs_list': [0.1, 0.01], 'power_range': [4, 20]}, 'cross_val_folds': 3, 'dual': True, 'force_retrain': False, 'loss': 'squared_hinge', 'low_shot': {'dataset_name': 'voc', 'k_values': [1, 2, 4, 8, 16, 32, 64, 96], 'sample_inds': [1, 2, 3, 4, 5]}, 'max_iter': 2000, 'normalize': True, 'penalty': 'l2'}, 'TEST_EVERY_NUM_EPOCH': 1, 'TEST_MODEL': True, 'TEST_ONLY': False, 'TRAINER': {'TASK_NAME': 'self_supervision_task', 'TRAIN_STEP_NAME': 'standard_train_step'}, 'VERBOSE': True} INFO 2022-03-29 23:39:29,647 train.py: 117: System config:
sys.platform linux Python 3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0] numpy 1.19.5 Pillow 9.0.1 vissl 0.1.6 @/home/mcwindy/.local/lib/python3.9/site-packages/vissl GPU available True GPU 0 NVIDIA GeForce RTX 2080 CUDA_HOME /usr/local/cuda-11.5/targets/x86_64-linux/include/ torchvision 0.12.0+cu102 @/home/mcwindy/.local/lib/python3.9/site-packages/torchvision hydra 1.0.7 @/home/mcwindy/.local/lib/python3.9/site-packages/hydra classy_vision 0.7.0.dev @/home/mcwindy/.local/lib/python3.9/site-packages/classy_vision tensorboard 2.8.0 apex 0.1 @/home/mcwindy/.local/lib/python3.9/site-packages/apex cv2 4.5.5 PyTorch 1.11.0+cu102 @/home/mcwindy/.local/lib/python3.9/site-packages/torch PyTorch debug build False
PyTorch built with:
CPU info:
Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian Address sizes 48 bits physical, 48 bits virtual CPU(s) 24 On-line CPU(s) list 0-23 Thread(s) per core 2 Core(s) per socket 12 Socket(s) 1 Vendor ID AuthenticAMD CPU family 25 Model 33 Model name AMD Ryzen 9 5900X 12-Core Processor Stepping 0 CPU MHz 3900.006 BogoMIPS 7800.01 Virtualization AMD-V Hypervisor vendor Microsoft Virtualization type full L1d cache 384 KiB L1i cache 384 KiB L2 cache 6 MiB L3 cache 32 MiB Vulnerability Itlb multihit Not affected Vulnerability L1tf Not affected Vulnerability Mds Not affected Vulnerability Meltdown Not affected Vulnerability Spec store bypass Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1 Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2 Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling Vulnerability Srbds Not affected Vulnerability Tsx async abort Not affected
WARNING 2022-03-29 23:39:29,647 moco_hooks.py: 45: Batch shuffling: True INFO 2022-03-29 23:39:29,647 tensorboard.py: 49: Tensorboard dir: checkpoints1/tb_logs INFO 2022-03-29 23:39:29,648 tensorboard_hook.py: 90: Setting up SSL Tensorboard Hook... INFO 2022-03-29 23:39:29,649 tensorboard_hook.py: 102: Tensorboard config: log_params: True, log_params_freq: 310, log_params_gradients: True, log_activation_statistics: 0 INFO 2022-03-29 23:39:29,649 trainer_main.py: 112: Using Distributed init method: tcp://localhost:50653, world_size: 1, rank: 0 INFO 2022-03-29 23:39:29,650 trainer_main.py: 130: | initialized host mcwindy_pc as rank 0 (0) INFO 2022-03-29 23:39:31,911 train_task.py: 181: Not using Automatic Mixed Precision INFO 2022-03-29 23:39:31,912 train_task.py: 455: Building model.... INFO 2022-03-29 23:39:31,912 resnext.py: 64: ResNeXT trunk, supports activation checkpointing. Deactivated INFO 2022-03-29 23:39:31,912 resnext.py: 87: Building model: ResNeXt50-1x64d-w1-BatchNorm2d INFO 2022-03-29 23:39:32,265 train_task.py: 656: Broadcast model BN buffers from primary on every forward pass INFO 2022-03-29 23:39:32,265 classification_task.py: 387: Synchronized Batch Normalization is disabled INFO 2022-03-29 23:39:32,305 optimizer_helper.py: 293: Trainable params: 161, Non-Trainable params: 0, Trunk Regularized Parameters: 53, Trunk Unregularized Parameters 106, Head Regularized Parameters: 2, Head Unregularized Parameters: 0 Remaining Regularized Parameters: 0 Remaining Unregularized Parameters: 0 INFO 2022-03-29 23:39:32,306 ssl_dataset.py: 156: Rank: 0 split: TEST Data files: ['/home/mcwindy/vissltest/data1/tiny-imagenet-200/val'] INFO 2022-03-29 23:39:32,306 ssl_dataset.py: 159: Rank: 0 split: TEST Label files: ['/home/mcwindy/vissltest/data1/tiny-imagenet-200/val'] INFO 2022-03-29 23:39:32,323 disk_dataset.py: 86: Loaded 10000 samples from folder /home/mcwindy/vissltest/data1/tiny-imagenet-200/val INFO 2022-03-29 23:39:32,323 ssl_dataset.py: 156: Rank: 0 split: TRAIN Data files: ['/home/mcwindy/vissltest/data1/tiny-imagenet-200/train'] INFO 2022-03-29 23:39:32,324 ssl_dataset.py: 159: Rank: 0 split: TRAIN Label files: ['/home/mcwindy/vissltest/data1/tiny-imagenet-200/train'] INFO 2022-03-29 23:39:32,543 disk_dataset.py: 86: Loaded 100000 samples from folder /home/mcwindy/vissltest/data1/tiny-imagenet-200/train INFO 2022-03-29 23:39:32,543 misc.py: 161: Set start method of multiprocessing to fork INFO 2022-03-29 23:39:32,543 init.py: 126: Created the Distributed Sampler.... INFO 2022-03-29 23:39:32,543 init.py: 101: Distributed Sampler config: {'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 10000, 'total_size': 10000, 'shuffle': True, 'seed': 0} INFO 2022-03-29 23:39:32,544 init.py: 215: Wrapping the dataloader to async device copies INFO 2022-03-29 23:39:32,544 misc.py: 161: Set start method of multiprocessing to fork INFO 2022-03-29 23:39:32,544 init.py: 126: Created the Distributed Sampler.... INFO 2022-03-29 23:39:32,544 init.py: 101: Distributed Sampler config: {'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 100000, 'total_size': 100000, 'shuffle': True, 'seed': 0} INFO 2022-03-29 23:39:32,544 init.py: 215: Wrapping the dataloader to async device copies INFO 2022-03-29 23:39:32,544 train_task.py: 384: Building loss... INFO 2022-03-29 23:39:32,607 trainer_main.py: 268: Training 105 epochs INFO 2022-03-29 23:39:32,607 trainer_main.py: 269: One epoch = 3125 iterations. INFO 2022-03-29 23:39:32,607 trainer_main.py: 270: Total 100000 samples in one epoch INFO 2022-03-29 23:39:32,607 trainer_main.py: 276: Total 328125 iterations for training INFO 2022-03-29 23:39:32,674 logger.py: 84: Tue Mar 29 23:39:32 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.60.02 Driver Version: 512.15 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:0A:00.0 On | N/A | | 26% 33C P2 45W / 245W | 2460MiB / 8192MiB | 11% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3244 C /python3.9 N/A | +-----------------------------------------------------------------------------+
INFO 2022-03-29 23:39:32,675 trainer_main.py: 173: Model is: Classy <class 'vissl.models.base_ssl_model.BaseSSLMultiInputOutputModel'>: BaseSSLMultiInputOutputModel( (_heads): ModuleDict() (trunk): ResNeXt( (_feature_blocks): ModuleDict( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv1_relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(<SUPPORTED_L4_STRIDE.two: 2>, <SUPPORTED_L4_STRIDE.two: 2>), bias=False) (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (flatten): Flatten() ) ) (heads): ModuleList( (0): MLP( (clf): Sequential( (0): Linear(in_features=2048, out_features=1000, bias=True) ) ) ) ) INFO 2022-03-29 23:39:32,675 trainer_main.py: 174: Loss is: {'name': 'MoCoLoss'} INFO 2022-03-29 23:39:32,675 trainer_main.py: 175: Starting training.... INFO 2022-03-29 23:39:32,676 init.py: 101: Distributed Sampler config: {'num_replicas': 1, 'rank': 0, 'epoch': 0, 'num_samples': 100000, 'total_size': 100000, 'shuffle': True, 'seed': 0} INFO 2022-03-29 23:39:32,837 ssl_dataset.py: 238: Using disk_folder labels from /home/mcwindy/vissltest/data1/tiny-imagenet-200/train INFO 2022-03-29 23:39:32,838 ssl_dataset.py: 238: Using disk_folder labels from /home/mcwindy/vissltest/data1/tiny-imagenet-200/train INFO 2022-03-29 23:39:32,838 ssl_dataset.py: 238: Using disk_folder labels from /home/mcwindy/vissltest/data1/tiny-imagenet-200/train INFO 2022-03-29 23:39:32,838 ssl_dataset.py: 238: Using disk_folder labels from /home/mcwindy/vissltest/data1/tiny-imagenet-200/train INFO 2022-03-29 23:39:32,839 ssl_dataset.py: 238: Using disk_folder labels from /home/mcwindy/vissltest/data1/tiny-imagenet-200/train Traceback (most recent call last): File "/home/mcwindy/vissltest/./tools/run_distributed_engines.py", line 200, in
hydra_main(overrides=overrides)
File "/home/mcwindy/vissltest/./tools/run_distributed_engines.py", line 175, in hydra_main
launch_distributed(
File "/home/mcwindy/vissltest/./tools/run_distributed_engines.py", line 115, in launch_distributed
_distributed_worker(
File "/home/mcwindy/vissltest/./tools/run_distributed_engines.py", line 166, in _distributed_worker
process_main(cfg, dist_run_id, local_rank=local_rank, node_id=node_id)
File "/home/mcwindy/vissltest/./tools/run_distributed_engines.py", line 152, in process_main
train_main(
File "/home/mcwindy/.local/lib/python3.9/site-packages/vissl/engines/train.py", line 130, in train_main
trainer.train()
File "/home/mcwindy/.local/lib/python3.9/site-packages/vissl/trainer/trainer_main.py", line 178, in train
self._advance_phase(task) # advances task.phase_idx
File "/home/mcwindy/.local/lib/python3.9/site-packages/vissl/trainer/trainer_main.py", line 319, in _advance_phase
task.recreate_data_iterator(
File "/home/mcwindy/.local/lib/python3.9/site-packages/vissl/trainer/train_task.py", line 564, in recreate_data_iterator
self.data_iterator = iter(self.dataloaders[phase_type])
File "/home/mcwindy/.local/lib/python3.9/site-packages/classy_vision/dataset/dataloader_async_gpu_wrapper.py", line 40, in iter
self.preload()
File "/home/mcwindy/.local/lib/python3.9/site-packages/classy_vision/dataset/dataloader_async_gpu_wrapper.py", line 46, in preload
self.cache_next = next(self._iter)
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/mcwindy/.local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/mcwindy/.local/lib/python3.9/site-packages/vissl/data/collators/moco_collator.py", line 45, in moco_collator
"data": [torch.stack(data).squeeze()[:, 0, :, :, :].squeeze()], # encoder
IndexError: too many indices for tensor of dimension 4