Closed thongnt99 closed 3 years ago
For pre-training simCLR with imageNet-1K, i also encountered the same problem.
For the understanding of the problem, we attach the yaml file.
config: VERBOSE: False LOG_FREQUENCY: 10000 TEST_ONLY: False TEST_MODEL: False SEED_VALUE: 0 MULTI_PROCESSING_METHOD: forkserver MONITOR_PERF_STATS: True PERF_STAT_FREQUENCY: 10 ROLLING_BTIME_FREQ: 5 DATA: NUM_DATALOADER_WORKERS: 5 TRAIN: DATA_SOURCES: [disk_folder] DATASET_NAMES: [imagenet1k_folder] BATCHSIZE_PER_REPLICA: 128 LABEL_TYPE: sample_index # just an implementation detail. Label isn't used TRANSFORMS:
# restart_interval_length: 0.5
wave_type: full
is_adaptive: True
restart_interval_length: 0.334
interval_scaling: [rescaled, rescaled] update_interval: step lengths: [0.1, 0.9] # 100ep DISTRIBUTED: BACKEND: nccl NUM_NODES: 1 NUM_PROC_PER_NODE: 1 INIT_METHOD: tcp RUN_ID: auto MACHINE: DEVICE: gpu CHECKPOINT: DIR: "." AUTO_RESUME: True CHECKPOINT_FREQUENCY: 1 OVERWRITE_EXISTING: true
thank you for reaching out. Could you share some details about what machine (GPUs ), yaml config, you are using?
Hi @prigoyal , thanks for your response. I have tested on different GPUs, including A100, GTX600, Tesla M40. And this is the configuration file:
# @package _global_
config:
VERBOSE: True
LOG_FREQUENCY: 100
TEST_ONLY: False
TEST_MODEL: False
SEED_VALUE: 0
MULTI_PROCESSING_METHOD: forkserver
MONITOR_PERF_STATS: TRUE
PERF_STAT_FREQUENCY: 10
ROLLIN_BTIME_FREQ: 5
DATA:
NUM_DATALOADER_WORKERS: 5
TRAIN:
DATA_SOURCES: [disk_folder]
DATASET_NAMES: [imagenet1k_folder]
BATCHSIZE_PER_REPLICA: 32
LABEL_TYPE: sample_index # isn't used
TRANSFORMS:
- name: Resize
size: 256
- name: RandomHorizontalFlip
- name: RandomCrop
size: 255
- name: RandomGrayscale
p: 0.66
- name: ToTensor
- name: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- name: ImgPatchesFromTensor
num_patches: 9
patch_jitter: 21
- name: ShuffleImgPatches
perm_file: https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/jigsaw_permutations/hamming_perms_2000_patches_9_max_avg.npy # perm 2K
COLLATE_FUNCTION: siamese_collator
MMAP_MODE: True
COPY_TO_LOCAL_DISK: False
METERS:
name: accuracy_list_meter
accuracy_list_meter:
num_meters: 1
topk_values: [1]
TRAINER:
TRAIN_STEP_NAME: standard_train_step
MODEL:
TRUNK:
NAME: resnet
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 1000], "use_bn": True, "use_relu": True, "skip_last_layer_relu_bn": False}],
["siamese_concat_view", {"num_towers": 9}],
["mlp", {"dims": [9000, 2000]}], # perm 2K
]
SYNC_BN_CONFIG:
CONVERT_BN_TO_SYNC_BN: True
SYNC_BN_TYPE: pytorch
AMP_PARAMS:
USE_AMP: False
AMP_ARGS: {"opt_level": "03", "keep_batchnorm_fp32": True, "master_weights": True, "loss_scale": "dynamic"}
LOSS:
name: cross_entropy_multiple_output_single_target
cross_entropy_multiple_output_single_target:
ignore_index: -1
OPTIMIZER:
name: sgd
use_larc: True
larc_config:
clip: False
trust_coefficient: 0.001
eps: 0.000001
weight_decay: 0.0001
momentum: 0.9
num_epochs: 105
nesterov: False
regularize_bn: False
regularize_bias: True
param_schedulers:
lr:
auto_lr_scaling:
auto_scale: true
base_value: 0.1
base_lr_batch_size: 256
name: composite
schedulers:
- name: linear
start_value: 0.025
end_value: 0.1
- name: multistep
values: [0.1, 0.01, 0.001, 0.0001, 0.00001]
milestones: [30, 60, 90, 100]
update_interval: epoch
interval_scaling: [rescaled, fixed]
lengths: [0.047619, 0.952381]
DISTRIBUTED:
BACKEND: nccl
NUM_NODES: 1
NUM_PROC_PER_NODE: 1
INIT_METHOD: tcp
RUN_ID: auto
MACHINE:
DEVICE: gpu
CHECKPOINT:
DIR: "."
AUTO_RESUME: True
CHECKPOINT_FREQUENCY: 1
OVERWRITE_EXISTING: true
Hi there Thac-Thong can you let us know how long you are expecting the training to take?
Based on the config you provided, your batch size is 32 -- the batch size is 256 in the paper. Can you cross reference all your hyper-params with the original jigsaw paper and make sure they match, and if possible up the batch size to 256?
(Also please note the ETA at the very beginning of a training will usually be longer than it will be, it will stabilize after 800+ iterations).
Just for reference:
BATCHSIZE_PER_REPLICA, controls the batch size. NUM_NODES: 1 controls the number of nodes and NUM_PROC_PER_NODE: 1 controls the number of gpus used.
I tried to train Jigsaw model on ImageNet-1k data following the guide here. However, the estimated time to finish training is very long (many days). I have tried to run on different types of GPU, on both one and mutiple GPUs, but the problem still persists. This estimated running time is much longer than the time reported in the Jigsaw paper. Do you have any idea of what I have done wrong? Thanks in advance.