chungyiweng / humannerf

HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.
MIT License
786 stars 86 forks source link

multi gpus training #26

Closed Andyen512 closed 2 years ago

Andyen512 commented 2 years ago

Hi chungyi, In the training code and the issue #21 , I know the batchsize must be 1. But I see you‘re using nn.DataParallel in the network code, so the batchsize num is less than gpu nums, and when I was training, the other gpus except gpu0 are not be used. So what's the purpose of using nn.DataParallel in this case? Will future codes support modifying batchsize?Thx image

chungyiweng commented 2 years ago

Hello Andy,

We still can leverage multiple GPUs since we pack "ray samples" as a batch and evenly distribute them to GPUs for MLP queries. In other words, the "batch" we sent to GPUs are ray samples, not images.

The batchsize=1 just means we only process one image at a time (with a lot of ray samples).

Regarding the GPU usage, if you keep monitoring it you should observe peaks in other GPUs (other than gpu:0).

I hope this makes it clear to you. Thanks for asking!