arthurdouillard / CVPR2021_PLOP

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
https://arxiv.org/abs/2011.11390
MIT License
145 stars 23 forks source link

Reproduce 15-1 setup on Pascal VOC #20

Closed mostafaelaraby closed 2 years ago

mostafaelaraby commented 2 years ago

Describe the bug I tried to run the provided pascal VOC script using Apex optimization 01 and everything same as script except i was using a single GPU and hence changed the batch size to 24. But I got the following results

1-15 16-20 all
Paper 65.12 21.11 54.64
Code results 58.73 21.6 49.7

To Reproduce start=date +%s`

START_DATE=$(date '+%Y-%m-%d')

PORT=$((9000 + RANDOM % 1000)) GPU=0 NB_GPU=1 DATA_ROOT=./data DATASET=voc TASK=15-5s NAME=PLOP METHOD=PLOP BATCH_SIZE=24 INITIAL_EPOCHS=30 EPOCHS=30 OPTIONS="--checkpoint checkpoints/step/"

RESULTSFILE=results/${STARTDATE}${DATASET}${TASK}${NAME}.csv rm -f ${RESULTSFILE}

CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step 0 --lr 0.01 --epochs ${INITIAL_EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} for step in 1 2 3 4 5 do CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step ${step} --lr 0.001 --epochs ${EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} done python3 average_csv.py ${RESULTSFILE}`

arthurdouillard commented 2 years ago

Hey, I don't have a GPU large enough to try a batch size of 24 on a single GPU so I cannot test your command.

However, multiple people have been able to reproduce 15-1 when using 2 GPUs (https://github.com/arthurdouillard/CVPR2021_PLOP/issues/3). Can you try that?

mostafaelaraby commented 2 years ago
After using 2 gpus i managed to reproduce the papers' results: 1-15 16-20 all
Paper 65.12 21.11 54.64
Code results 1 GPU 58.73 21.6 49.7
Code results 2 GPUs 65.2 20.9 54.7

What i have noticed is that both runs with 1 and 2 gpus share the same results with small differences till the last task and on the last task the mIoU of old drops from 65% on previous task to 58% on the other hand with 2 gpus it only drops from 68% to 65%.

@arthurdouillard but i was wondering what is the reason it can be reproduced only with 2 gpus and mixed precision?

arthurdouillard commented 2 years ago

I think the problem comes from either:

mostafaelaraby commented 2 years ago

Thanks @arthurdouillard

arthurdouillard commented 2 years ago

Don't hesitatee to reopen this issue if you have new findings. Best,