ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
878 stars 153 forks source link

Training time is too long #23

Open xingpeima opened 1 year ago

xingpeima commented 1 year ago

Hi, I found that it takes 0.07 seconds to train a step forward, but 15 seconds to backward,thus the total time will take several days. Is there something wrong with my training code!

ashawkey commented 1 year ago

This is strange, what's the environment (e.g., GPU, CUDA, OS) you use? How do you measure the time?

xingpeima commented 1 year ago

this is the training print info: image

it will take 27 hours per epoch. the environment is V100 32G, is that normal?

ashawkey commented 1 year ago

No... it should take less than 10 hours to finish all epochs on V100. Are you facing this slow speed problem when training other DL models (e.g., resnet)?

Erickrus commented 1 year ago

Is there a way to speed up the training process, with only 1 GPU ?

ashawkey commented 1 year ago

The current training speed doesn't have much space to improve I guess. Maybe you could increase the num_rays and train less steps, but this may scarifice performance. Also, you may try torch 2.0's compile.

yediny commented 1 year ago

I have CUDA version 11.7, A100 40GB, and I'm having an issue where it's taking 4-5 hours per epoch. 스크린샷 2023-02-21 오후 1 15 05 스크린샷 2023-02-21 오후 1 13 59

ashawkey commented 1 year ago

@yediny Could you provide the command you use? If you have enough GPU memory, you could try to use --preload 2 to see if the speed bottleneck is image loading.

yediny commented 1 year ago

Even with preload applied.. it is still the same speed. Is there any way to make the most use of gpu memory for training? And under the normal case, does it take a day to training one model? 스크린샷 2023-02-23 오후 2 00 45

braintown commented 1 year ago

how to use the torch.compile accelerate training?

braintown commented 1 year ago

it seems we should use LightningModule to rewrite the nn.Module and then use the compile?