Open barikata1984 opened 1 year ago
Hi @barikata1984 ! Sorry for the delayed reply here - I suspect this is due to a configuration change (we set "high quality" as the new default): https://github.com/NVIDIAGameWorks/kaolin-wisp/commit/99639ae60de4d1c6f4f721e3b6d1004e258afa5b#diff-0e84d1aed551f592a75f92bacc6eed1545bdaeb03042d1fb2f6aa17343e5db8bR46
Can you try with a reduced sample-per-ray count?
python app/nerf/main_nerf.py --dataset-path /path/to/lego/ --config app/nerf/configs/nerf_hash.yaml --tracer.num_steps 512
I've also tracked all config updates here: https://kaolin-wisp.readthedocs.io/en/latest/pages/config_system.html#converting-older-configs-up-to-wisp-v1-0-2
Hi @orperel,
Thanks for your response.
I reduced sample-per-ray from 512 to 16, halving the value iteratively but the process got killed.
It looks like something happens when running train_dataset = instantiate(cfg.dataset, transform=dataset_transform)
in main_nerf.py
.
To see it, I added the following lines
+ print("Instantiating dataset_transform")
dataset_transform = instantiate(cfg.dataset_transform) # SampleRays creates batches of rays from the dataset
+ print("Instantiating train_dataset")
train_dataset = instantiate(cfg.dataset, transform=dataset_transform) # A Multiview dataset
in app/nerf/main_nerf.py
and
+ print("================= Flag 0 =================")
instance = instantiate(config, **overriden_args)
+ print("================= Flag 1 =================")
in wisp/config/utils.py
. The output is
$ python app/nerf/main_nerf.py --dataset-path /path/to/lego/ --config app/nerf/configs/nerf_hash.yaml --tracer.num_steps 16
blas
constructor: OctreeAS.make_dense
level: 7
grid
constructor: HashGrid.from_geometric
feature_dim: 2
num_lods: 16
multiscale_type: cat
feature_std: 0.01
feature_bias: 0.0
codebook_bitwidth: 19
min_grid_res: 16
max_grid_res: 2048
nef
constructor: NeuralRadianceField
pos_embedder: none
view_embedder: positional
pos_multires: 10
view_multires: 4
position_input: False
activation_type: relu
layer_type: linear
hidden_dim: 64
num_layers: 1
prune_density_decay: 0.6
prune_min_density: 2.956033378250884
tracer
constructor: PackedRFTracer
raymarch_type: ray
num_steps: 16
step_size: 1.0
bg_color: black
dataset
constructor: NeRFSyntheticDataset
dataset_path: ../nerf_data/lego/
split: train
bg_color: white
mip: 0
dataset_num_workers: -1
transform: None
dataset_transform
constructor: SampleRays
num_samples: 4096
trainer
optimizer
constructor: RMSprop
lr: 0.001
alpha: 0.99
eps: 1e-08
weight_decay: 0.0
momentum: 0.0
dataloader
batch_size: 1
num_workers: 0
exp_name: nerf-hash
mode: train
max_epochs: 100
save_every: -1
save_as_new: False
model_format: full
render_every: -1
valid_every: -1
enable_amp: True
profile_nvtx: True
grid_lr_weight: 100.0
prune_every: 100
random_lod: False
rgb_lambda: 1.0
tracker
tensorboard
constructor: _Tensorboard
log_dir: _results/logs/runs
wandb
constructor: _WandB
project: wisp-nerf
entity: None
run_name: None
job_type: train
sync_tensorboard: True
visualizer
constructor: OfflineRenderer
render_res: (1024, 1024)
render_batch: 10000
shading_mode: rb
matcap_path: ./data/matcap/Pearl.png
shadow: False
ao: False
perf: False
vis_camera
camera_origin: (-3.0, 0.65, -3.0)
camera_lookat: (0.0, 0.0, 0.0)
camera_fov: 30.0
camera_clamp: (0.0, 10.0)
viz360_num_angles: 20
viz360_radius: 3.0
viz360_render_all_lods: False
enable_tensorboard: True
enable_wandb: False
log_dir: _results/logs/runs
log_level: 20
pretrained: None
device: cuda
interactive: True
Instantiating dataset_transform
================= Flag 1 =================
================= Flag 2 =================
Instantiating train_dataset
================= Flag 1 =================
loading data: 100%|████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 30.43it/s]
/home/atsushi/miniconda3/envs/wisp/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Killed
Do you have any other ideas to clear this issue?
Hi @barikata1984 thanks for this bug report.
I ran some memory profiling and indeed the main branches uses upwards of 14GB of resident memory at peak, which really shouldn't be the case.
I dug into the issue a bit and I fixed some benign issues in: https://github.com/NVIDIAGameWorks/kaolin-wisp/pull/164
Now the resident memory at least according to my profiling is 8GB (so a 6GB reduction). If you want further savings, I would pass in --valid-every -1
to disable validation, since the validation dataset takes around 3GB ish of memory.
Let me know if this works for you!
Hi @tovacinni, thanks a lot for the solution! As you suggested, --valid-every -1
worked while with validation running still got killed due to RAM shortage. I will try again on a different PC with sufficient RAM
Description
Hi,
I tried to run main_nerf.py in the main branch. But it suddenly stopped showing a one-word line
Killed
. It is presumably due to RAM shortage, according to google. I checked the usage and it reached its limit immediately before the app stopped. Do you have any idea how to deal with this issue?I followed all the installation procedures, including requirements_app.txt. main_nerf.py in the stable branch works without any problems. So, if the config system is the only major change between the main and stable branches, the issue should be caused by the new config system. I suppose you can reproduce the larger RAM usage in your environment.
I installed pyopengl_accelerate separately because a msg telling the module is missing appeared when I ran the stable main_nerf.py for the first time, but the conda env should be clean to run wisp apps.
I know the easiest solution is increasing RAM. But the stable config system works fine even with limited RAM. It would be great if I could also use the new one on the same machine since it looks much cleaner.
Thanks in advance!
Machine spec
Reproduction steps
pip install pyopengl_accelerate
python app/nerf/main_nerf.py --dataset-path /path/to/lego/ --config app/nerf/configs/nerf_hash.yaml