facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.26k stars 334 forks source link

Very long estimated running time #360

Closed thongnt99 closed 3 years ago

thongnt99 commented 3 years ago

I tried to train Jigsaw model on ImageNet-1k data following the guide here. However, the estimated time to finish training is very long (many days). image I have tried to run on different types of GPU, on both one and mutiple GPUs, but the problem still persists. This estimated running time is much longer than the time reported in the Jigsaw paper. Do you have any idea of what I have done wrong? Thanks in advance.

SJShin-AI commented 3 years ago

For pre-training simCLR with imageNet-1K, i also encountered the same problem.

For the understanding of the problem, we attach the yaml file.

config: VERBOSE: False LOG_FREQUENCY: 10000 TEST_ONLY: False TEST_MODEL: False SEED_VALUE: 0 MULTI_PROCESSING_METHOD: forkserver MONITOR_PERF_STATS: True PERF_STAT_FREQUENCY: 10 ROLLING_BTIME_FREQ: 5 DATA: NUM_DATALOADER_WORKERS: 5 TRAIN: DATA_SOURCES: [disk_folder] DATASET_NAMES: [imagenet1k_folder] BATCHSIZE_PER_REPLICA: 128 LABEL_TYPE: sample_index # just an implementation detail. Label isn't used TRANSFORMS:

prigoyal commented 3 years ago

thank you for reaching out. Could you share some details about what machine (GPUs ), yaml config, you are using?

thongnt99 commented 3 years ago

Hi @prigoyal , thanks for your response. I have tested on different GPUs, including A100, GTX600, Tesla M40. And this is the configuration file:

# @package _global_
config:
  VERBOSE: True
  LOG_FREQUENCY: 100
  TEST_ONLY: False
  TEST_MODEL: False
  SEED_VALUE: 0
  MULTI_PROCESSING_METHOD: forkserver
  MONITOR_PERF_STATS: TRUE
  PERF_STAT_FREQUENCY: 10
  ROLLIN_BTIME_FREQ: 5
  DATA:
    NUM_DATALOADER_WORKERS: 5
    TRAIN:
      DATA_SOURCES: [disk_folder]
      DATASET_NAMES: [imagenet1k_folder]
      BATCHSIZE_PER_REPLICA: 32
      LABEL_TYPE: sample_index # isn't used
      TRANSFORMS:
        - name: Resize
          size: 256
        - name: RandomHorizontalFlip
        - name: RandomCrop
          size: 255
        - name: RandomGrayscale
          p: 0.66
        - name: ToTensor
        - name: Normalize
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
        - name: ImgPatchesFromTensor
          num_patches: 9
          patch_jitter: 21
        - name: ShuffleImgPatches
          perm_file: https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/jigsaw_permutations/hamming_perms_2000_patches_9_max_avg.npy   # perm 2K
      COLLATE_FUNCTION: siamese_collator
      MMAP_MODE: True
      COPY_TO_LOCAL_DISK: False
  METERS:
    name: accuracy_list_meter
    accuracy_list_meter:
      num_meters: 1
      topk_values: [1]
  TRAINER:
    TRAIN_STEP_NAME: standard_train_step
  MODEL:
    TRUNK:
      NAME: resnet
      RESNETS:
        DEPTH: 50
    HEAD:
      PARAMS: [
        ["mlp", {"dims": [2048, 1000], "use_bn": True, "use_relu": True, "skip_last_layer_relu_bn": False}],
        ["siamese_concat_view", {"num_towers": 9}],
        ["mlp", {"dims": [9000, 2000]}],    # perm 2K
      ]
    SYNC_BN_CONFIG:
      CONVERT_BN_TO_SYNC_BN: True
      SYNC_BN_TYPE: pytorch
    AMP_PARAMS:
      USE_AMP: False
      AMP_ARGS: {"opt_level": "03", "keep_batchnorm_fp32": True, "master_weights": True, "loss_scale": "dynamic"}
  LOSS:
    name: cross_entropy_multiple_output_single_target
    cross_entropy_multiple_output_single_target:
      ignore_index: -1
  OPTIMIZER:
      name: sgd
      use_larc: True
      larc_config:
              clip: False
              trust_coefficient: 0.001
              eps: 0.000001
      weight_decay: 0.0001
      momentum: 0.9
      num_epochs: 105
      nesterov: False
      regularize_bn: False
      regularize_bias: True
      param_schedulers:
        lr:
          auto_lr_scaling:
            auto_scale: true
            base_value: 0.1
            base_lr_batch_size: 256
          name: composite
          schedulers:
            - name: linear
              start_value: 0.025
              end_value: 0.1
            - name: multistep
              values: [0.1, 0.01, 0.001, 0.0001, 0.00001]
              milestones: [30, 60, 90, 100]
          update_interval: epoch
          interval_scaling: [rescaled, fixed]
          lengths: [0.047619, 0.952381]
  DISTRIBUTED:
    BACKEND: nccl
    NUM_NODES: 1
    NUM_PROC_PER_NODE: 1
    INIT_METHOD: tcp
    RUN_ID: auto
  MACHINE:
    DEVICE: gpu
  CHECKPOINT:
    DIR: "."
    AUTO_RESUME: True
    CHECKPOINT_FREQUENCY: 1
    OVERWRITE_EXISTING: true
iseessel commented 3 years ago

Hi there Thac-Thong can you let us know how long you are expecting the training to take?

Based on the config you provided, your batch size is 32 -- the batch size is 256 in the paper. Can you cross reference all your hyper-params with the original jigsaw paper and make sure they match, and if possible up the batch size to 256?

(Also please note the ETA at the very beginning of a training will usually be longer than it will be, it will stabilize after 800+ iterations).

Just for reference:

BATCHSIZE_PER_REPLICA, controls the batch size. NUM_NODES: 1 controls the number of nodes and NUM_PROC_PER_NODE: 1 controls the number of gpus used.