Slow Training, any Advice?

NoamRosenberg / autodeeplab

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

MIT License

307 stars 73 forks source link

Slow Training, any Advice? #25

Open albert-ba opened 5 years ago

albert-ba commented 5 years ago

Hi, I'm using this command line: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24

I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice

Thanks!

NoamRosenberg commented 5 years ago

Hi @albert-ba, for now I suggest training on 1 GPU with 2 batch size. It will take you a couple hours per epoch. It’s still not fast enough but we’re working on it.

Once we’ve sufficiently sped up the code on a single GPU, we’ll start looking into speeding up multi-GPU.

All the best!

albert-ba commented 5 years ago

Hi it doesn't seem to help.

98 hours to finish the epoch..

I know that this repo in still under optimization, I have no complains at all you are doing wonderfull job, but if think about something susspicous that I'm doing wrong please share. maybe related to those warnings?

iariav commented 5 years ago

what GPUs are you running on? these warnings are known and can be ignored for now, i don't think they're related to your issue at all.

I see that you used the coco dataset. we usually conduct all our experiments with cityscapes. could you try to train with cityscapes with the default args and see what you get?

fanrupin commented 4 years ago

hi，I met the same question with you. I'm getting an epoch per day. I start training with this command line. CUDA_VISIBLE_DEVICES=0 python train_autodeeplab.py --dataset cityscapes

I use a single GPU quadro gv100, training with cityscapes. Batch size is 2. Other I am configured by default. Have you solved this problem?@albert-ba Do you have any advice?@NoamRosenberg @iariav

NoamRosenberg commented 4 years ago

@fanrupin currently the multi-gpu doesn't always work faster. I have to look into this further but I don't have time now. If you have time to experiment and find the problem I would love you to make a pull request and become a contributer.

MoExplorer commented 3 years ago

Hi, I'm using this command line: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24

I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice

Thanks!

Hello, I also want to use this code on COCO dataset. I adopted the command line just like yours: CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24 ----gpu-ids 0,1

But I got an error: Untitled I wanna know whether you have met this error? If yes, could you please tell my how to solve it?

Thanks a lot!

ZhiboRao commented 3 years ago

Hi, I'm using this command line: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24 I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice Thanks!

Hello, I also want to use this code on COCO dataset. I adopted the command line just like yours: CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24 ----gpu-ids 0,1

But I got an error: I wanna know whether you have met this error? If yes, could you please tell my how to solve it?

Thanks a lot!

you need to change the crop size