feiyuhuahuo / Yolact_minimal

Minimal PyTorch implementation of YOLACT.
237 stars 70 forks source link

Long train time #58

Closed Harry-zzh closed 2 years ago

Harry-zzh commented 2 years ago

Hi, I use my own custom dataset to train the model with resnet50 backbone, but the training seems to take a long time.

image

The ETA even reachs 99 days.

There are only 1000 images in my dataset. I wonder why it happens.

Looking forward to your reply!

feiyuhuahuo commented 2 years ago

't_d' means the time used for data preprocess, which is extremely unnormal. Usually it is in range of 0.04~0.08. How many num_workers do you use? Do you run the program on Win10?

Harry-zzh commented 2 years ago

Oh, I find that I set num_workers to 0 accidentally, now I will set it to 8. I run the program on Linux:

image

But is it normal to take a few minutes to load the data? My training batch size is 4. The page is still stuck until now:

image

feiyuhuahuo commented 2 years ago

https://github.com/feiyuhuahuo/Yolact_minimal/blob/d920c0563f54746426a006cd3af83ff3bae293cc/utils/coco.py#L54-L55 Please check here if the script can get your data.

Harry-zzh commented 2 years ago

Finally I found the reason. I forgot to modify the CUSTOM_CLASSES. Now the training process is normal:

image

Thank you very much.