the training is very slow

dahyun-kang / ifsl

[CVPR'22] Official PyTorch implementation of Integrative Few-Shot Learning for Classification and Segmentation

https://arxiv.org/abs/2203.15712

MIT License

126 stars 16 forks source link

the training is very slow #8

Closed XIAO1HAI closed 1 year ago

XIAO1HAI commented 1 year ago

Hello, I bother you again.

Recently, I have been using 2080ti to reproduce the paper in this aspect, but I found that the training is very slow, which takes about seven days a week. What is the reason? I hope you can give me some advice

dahyun-kang commented 1 year ago

Hello

How many 2080TIs are you using? In experience, it took three days with two TitanXPs on Pascal-5i as stated in README.md. Therefore, I feel seven days for (presumably) a single 2080TI is reasonably comparable with my experiment setting. I observed the trade-off that the correlation-based methods generalize better (=show powerful performance on unseen classes) but converge slower assumingly due to the high-dimensional input complexity. I hope it helps. Thank you!

Best, Dahyun

XIAO1HAI commented 1 year ago

I was using two 2080ti graphics cards on the Pascal-5i data set, batch set to 12, and training 500epoch took about 6-7 days, so I was wondering, is there some setup missing? Therefore, I used some acceleration techniques of pytorch-lightning (accumulate_grad_batches, precision=16, etc.), and the effect will be better, but not obvious. Therefore, I would like to consult you for some good suggestions

dahyun-kang commented 1 year ago

Hello again

I uploaded the codebase as-is and didn't tweak any computing acceleration techniques. What I would suggest is to match the pytorch and pytorch-lightning versions as the same as mine with the provided environment.yml. The codebase isn't missing anything, thus I cannot give more concrete suggestions.

Best regards, Dahyun

XIAO1HAI commented 1 year ago

Ok, thanks again for the answer,

Best regards,I wish you all the best in your scientific research！ XIAOHAI

dahyun-kang commented 1 year ago

Thank you, all the best to you too!

p.s. If you find no room for improvement on the hardware acceleration, then another workaround is to skip some of the validation phases, which occur every (training) epoch in this implementation.

XIAO1HAI commented 1 year ago

Ok, I'll think about it