Training Time - Githubissues

hecodeit commented 1 year ago

As the document write:

For Mipnerf360 dataset, the model is trained with a downsample factor of 4 for outdoor scene and 2 for indoor scene(same as in paper). Training speed is about 1.5x slower than paper(1.5 hours on 8 A6000).

I test on A6000 cloud GPU. With ONE GPUs, the GARDEN SCENE speed is 1.1-1.2 hours. With Multiple GPUs (2-4) , the speed drop down, need more time like 1.2-1.5 hours.

ONE GPUs speed 1.2 hours:

accelerate launch train.py \
--gin_configs=configs/360.gin \
--gin_bindings="Config.data_dir = '${DATA_DIR}'" \
--gin_bindings="Config.exp_name = '${EXP_NAME}'" \
--gin_bindings="Config.factor = 4" \

ONE GPUs, add batch_size and render_chunk_size limit even faster need only 1.1 hours

accelerate launch train.py \
--gin_configs=configs/360.gin \
--gin_bindings="Config.data_dir = '${DATA_DIR}'" \
--gin_bindings="Config.exp_name = '${EXP_NAME}'" \
--gin_bindings="Config.factor = 4" \
--gin_bindings="Config.batch_size = 4096" \
--gin_bindings="Config.render_chunk_size = 4096"

So if the multiple GPUs does not working effectively? Or performance bottleneck?

hecodeit commented 1 year ago

No, my bad.

I use A5000 and A6000 cloud GPUs for testing. I use the wrong testing result at my first post. Ignore it please.

hermosaaurora commented 11 months ago

So what is your test result?

SuLvXiangXin / zipnerf-pytorch

Training Time #57

ONE GPUs speed 1.2 hours:

ONE GPUs, add batch_size and render_chunk_size limit even faster need only 1.1 hours