google-research / multinerf

A Code Release for Mip-NeRF 360, Ref-NeRF, and RawNeRF
Apache License 2.0
3.57k stars 338 forks source link

RESOURCE_EXHAUSTED during training #99

Closed takuyaliu closed 1 year ago

takuyaliu commented 1 year ago

Sorry to bother you with a very elementary question. I've tried to train multinerf, I use "python -m train --gin_configs=configs/360.gin --gin_bindings="Config.data_dir = './datasets/rawnerf/scenes/candlefiat/'" --gin_bindings="Config.checkpoint_dir = './datasets/rawnerf/scenes/candlefiat/checkpoints'" --logtostderr"

and then the following appeared:

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/lwb/anaconda3/envs/multinerf/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/lwb/anaconda3/envs/multinerf/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/lwb/multinerf-main/train.py", line 288, in app.run(main) File "/home/lwb/anaconda3/envs/multinerf/lib/python3.9/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/lwb/anaconda3/envs/multinerf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/lwb/multinerf-main/train.py", line 119, in main state, stats, rngs = train_pstep( jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory allocating 56932566736 bytes.

Is that means I should use GPU/TPU to train?

97littleleaf11 commented 1 year ago

You might modify your config with a smaller batch_size. https://github.com/google-research/multinerf#oom-errors