Questions on contrastive training

xyimaging commented 2 years ago

Hi, I'm trying to train the contrastive encoder from scratch on the ACDC data set, however, unlike your result shown on #3, my loss is stuck at 5.5597 from epoch 0 to epoch 30 until I cancelled it due to long time running. I'm wondering if I missed any tricks?

My results:

looks fine on the first several iterations
stucked into 5.5597 in the following 30 epochs

I run the code with:

python3 train_contrast.py --device cuda:0 --batch_size 25 --epochs 300 --data_dir "./dataset/02_08_2022/acdc/unlabeled/" --lr 0.1 --do_contrast --dataset acdc --patch_size 352 352 \
--experiment_name contrast_acdc_pcl_temp01_thresh035 --slice_threshold 0.35 --temp 0.1 --initial_filter_size 48 --classes 512 --contrastive_method pcl

Here I set the batch_size to 25 to avoid the memory crush.

dewenzeng commented 2 years ago

@xyimaging On my end, when using batch size 24, it works fine. Maybe it's because of the data you are using, when I use bs=24, the total number of batches is 1056, which is different from yours. the training loss curve with bs=24

One thing you may also try is the learning rate warmup https://github.com/HobbitLong/SupContrast/blob/a8a275b3a8b9b9bdc9c527f199d5b9be58148543/main_supcon.py#L216

xyimaging commented 2 years ago

The bug is solved, thanks!

zjjJacinthe commented 2 years ago

I met the same problem. Could you please tell me how do you solve that? @xyimaging

xyimaging commented 2 years ago

Sorry for the late response. It has been a while ago so I can't remember what exactly the mistake is. I guess if you copy the original code and run it would be fine.

dewenzeng / positional_cl

Questions on contrastive training #5