the device used in training

JierunChen / FasterNet

[CVPR 2023] Code for PConv and FasterNet

669 stars 54 forks source link

the device used in training #55

Open tianlianghai opened 10 months ago

tianlianghai commented 10 months ago

what device did you use in training, I use 512 per V100 16GB lead to an OOM error. but if I use a small batch, the loss go to NaN

train_fasternet_m(){
    python train_test.py -g 0,1 --num_nodes 1 -n 4 -b 1024 -e 500 \
        --pin_memory --wandb_project_name fasternet \
        --model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_m.yaml
}

99-WSJ commented 3 months ago

what device did you use in training, I use 512 per V100 16GB lead to an OOM error. but if I use a small batch, the loss go to NaN
train_fasternet_m(){
    python train_test.py -g 0,1 --num_nodes 1 -n 4 -b 1024 -e 500 \
        --pin_memory --wandb_project_name fasternet \
        --model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_m.yaml
}

hello，did you solve it? it occurs in my experiments.