Closed frank-xwang closed 3 years ago
Hi @frank-xwang,
We didn't explore smaller batches, but I'm happy to help if you're interested in investigating this. In general, the loss should be relatively robust, but the size of the support set does make a difference (as per the ablation in Section 7). Thus, you may need longer training with small batches.
Hello, thank you for your reply. I think that batch-size ablation study will be very interesting for researchers in many academic groups that do not have so many computing resources. It would be great if you could provide this kind of ablation study. Also, it seems that we need to install Slurm to run your code on ImageNet, which requires sudo permissions :-(. Could you please release a version of codes or main file that can run directly on a single machine without installing Slurm? Thanks a lot!
I'll get back to you about the batch-size ablation, but it's unlikely I'll be able to get to this soon unfortunately. As for the a version that doesn't require Slurm, you can launch your ImageNet jobs with "main.py" instead of "main_distributed.py" and that should work on a single GPU without Slurm! For example
python main.py
--sel paws_train
--fname configs/paws/imgnt_train_1GPU.yaml
Awesome! Thank you! For the main file, sorry for the unclearness, I mean one machine with 8 GPUs. Do we have to use main_distributed.py? Is there any main file that is able to work without Slurm on 8 GPUs, with distributed training?
Oh yes I see what you mean. Just pushed a change so that you can now run main.py using several GPUs on a multi-gpu machine, just specify the devices as command line arguments. For example, to run training on 8GPUs, specify the devices as so:
python main.py
--sel paws_train
--fname configs/paws/imgnt_train_8GPU.yaml
--devices cuda:0 cuda:1 cuda:2 cuda:3 cuda:4 cuda:5 cuda:6 cuda:7
Great! Thanks!
@frank-xwang Hi, did you run successfully on 8gpus ? Could you share your training time
Hi, after reducing "unsupervised_batch_size" and "supervised_imgs_per_class", I can run it on 4 V100 GPUs. The training time for each epoch is approximately 0.8 hours. But I think reducing batch size may reduce performance, which may need to be verified after completing the experiment.
Hi, after reducing "unsupervised_batch_size" and "supervised_imgs_per_class", I can run it on 4 V100 GPUs. The training time for each epoch is approximately 0.8 hours. But I think reducing batch size may reduce performance, which may need to be verified after completing the experiment. @frank-xwang Hi, have you finished your experiment on 4 V100 GPUs? I also want to run the experiment on 8 V100 GPUs but I am a littile worried about the performance and the speed. Thanks a lot!
Hi @CloudRR, I tried some hyperparameters, but failed to reproduce the reported results with 4 V100 GPUs. Although the speed is not bad, training 1 epoch takes about 1 hour. It seems that PAWS is also sensitive to batch size, as has been observed in many self-supervised learning methods.
Same here. Couldnt reproduce with 4gpus, and also 1h/epoch
Hi,
I've had a lot on my plate, but I did manage to try out a PAWS run on ImageNet with a small batch-size, and it essentially reproduces the large-batch numbers.
Using 8 V100 GPUs for 100 epochs with 10% of ImageNet labels, I get
This top-1 accuracy is consistent with the ablation in the bottom row of table 4 in the paper (similar support set, but much larger batch-size).
Here is the config I used to produce this result when running on 8 GPUs. To explain some of the choices:
me_max: false
. With a small batch-size, it's not clear to me that using me-max regularization makes sense, so I turned it off.All other hyper-parameters are identical to the large-batch setup.
criterion:
classes_per_batch: 70
me_max: false
sharpen: 0.25
supervised_imgs_per_class: 3
supervised_views: 1
temperature: 0.1
unsupervised_batch_size: 32
data:
color_jitter_strength: 1.0
data_seed: null
dataset: imagenet
image_folder: imagenet_full_size/061417/
label_smoothing: 0.1
multicrop: 6
normalize: true
root_path: datasets/
subset_path: imagenet_subsets
unique_classes_per_rank: true
unlabeled_frac: 0.90
logging:
folder: /path_to_save_models_and_logs/
write_tag: paws
meta:
copy_data: true
device: cuda:0
load_checkpoint: false
model_name: resnet50
output_dim: 2048
read_checkpoint: null
use_fp16: true
use_pred_head: true
optimization:
epochs: 100
final_lr: 0.0012
lr: 1.2
momentum: 0.9
nesterov: false
start_lr: 0.3
warmup: 10
weight_decay: 1.0e-06
Hi, thanks for sharing the code! I am curious about PAWS' sensitivity to batch size. Have you tried experimenting with smaller batch sizes (such as 256 or 512) that 8 GPUs can afford on ImageNet? Thanks. @MidoAssran