CIFAR100 - 50 Step results with default config on 2 GPUs

vgthengane commented 2 years ago

Hello, Thank you so much for your work.

I ran the default configuration provided by you on 2 GPUs with batch_size=128, incremental_batch_size=128, incremental_lr=0.0005, but the accuracy is different from the paper. I am getting avg_acc = 64.25; last_acc = 44.21 which is more closer to the distMem results.

$ train.sh 0,1

.sh file
========

set -e

GPUS=$1
NB_COMMA=`echo ${GPUS} | tr -cd , | wc -c`
NB_GPUS=$((${NB_COMMA} + 1))
PORT=$((9000 + RANDOM % 1000))

shift

echo "Launching exp on $GPUS..."
CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=${GPUS} \
    python -m torch.distributed.launch \
        --master_port ${PORT} \
        --nproc_per_node=${NB_GPUS} \
        main.py \
            --options options/data/cifar100_2-2.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml \
            --name dytox \
            --data-path ../datasets/CIFAR100 \
            --output-basedir ./experiments/ \
            --memory-size 1000

What does the default configuration do? Does it use distributed memory, global memory or something different? Could you please elaborate more on what method you used in the paper? and How I can reproduce the results.

Thanks in advance.

arthurdouillard commented 2 years ago

Hello,

You are right, by default DyTox uses the distributed memory strategy (distMem) as you can see here: https://github.com/arthurdouillard/dytox/blob/main/main.py#L246

You can disable that with --global-memory.

Note that in global memory the effective rehearsal memory is equal to the asked rehearsal memory, while with distributed memory the effective rehearsal memory is up to the asked rehearsal memory times the number of GPUs.

vgthengane commented 2 years ago

Hi @arthurdouillard, Thank you for your quick reply.

Passing the --global-memory will give the avg acc = 64.39 and last acc = 43.47 for 50 steps CIFAR100, right? Here I am trying to understand, How can I reproduce the same results as your paper and what is that method (global, distributed, or something that I don't know)?

arthurdouillard commented 2 years ago

Passing the --global-memory will give the avg acc = 64.39 and last acc = 43.47 for 50 steps CIFAR100, right? Here I am trying to understand

Yes.

How can I reproduce the same results as your paper and what is that method (global, distributed, or something that I don't know)?

In the paper I've used (without knowing at the time of publication) distributed memory. On CIFAR with 2 GPUs, but on ImageNet with more GPUs. I'd advise you to refer to the results there https://github.com/arthurdouillard/dytox/blob/main/erratum_distributed.md . And I'll try to update the paper with that later this summer (I need one more result).

vgthengane commented 2 years ago

Sure, Thank you again.

arthurdouillard / dytox

CIFAR100 - 50 Step results with default config on 2 GPUs #10