SuLvXiangXin / zipnerf-pytorch

Unofficial implementation of ZipNeRF
Apache License 2.0
783 stars 85 forks source link

add support for nerfstudio #98

Closed Jing1Ling closed 5 months ago

Jing1Ling commented 5 months ago
  1. According to the template provided by nerfstudio, several related files have been added. • 'zipnerf_config.py': parameters configuration. • 'zipnerf_model.py': use a model wrapper to reuse the Model class in 'internal/models.py'.

  2. Replace 'cam_dirs' with 'directions' as dicussed in this issue. You can use some tools provided by nerfstudio (e.g. viewer) with this patch. Except for the modification of cast_ray(), the original content will not be affected. This is also because camera directions are not provided in Nerfstudio's input data which named RayBundle.

Jing1Ling commented 5 months ago

I simply tested the difference between ’cam_dirs‘ and ’directions‘ on the garden scene. Each method trained two models based on nerfstudio, and the psnr difference of the validation set was within 0.1.

SuLvXiangXin commented 5 months ago
  1. I'm not familiar with nerfstudio, but it seems that we need to perform additional install. Can you give detailed instruction in readme?
  2. I find that using the latest version of nerfstudio, can cause some error by running ns-train zipnerf --data /SSD_DISK/datasets/360_v2/bicycle/, leading to AssertionError: Colmap path /SSD_DISK/datasets/360_v2/bicycle/colmap/sparse/0 does not exist., and maybe additional argument need to add for that?
Jing1Ling commented 5 months ago
  1. Done! Sorry for missing that.
  2. This is because the default colmap path of nerfstudio is 'colmap/sparse/0'. I changed it to 'sparse/0' through config file. It can also be achieved through 'ns-train zipnerf --data xxx colmap --colmap-path sparse/0'.
SuLvXiangXin commented 5 months ago

Hi,when I start training, it appears to be index out of bounds, which comes from here. I find that the ray_indices[:,1].max()==image_height, which is wrong, it should be image_height-1, so does the width. I'm not sure how to fix this.

Jing1Ling commented 5 months ago

I mentioned this situation in README.md:

*Nerfstudio's ColmapDataParser rounds down the image size when downscaling, which is different from the 360_v2 dataset.You can use nerfstudio to reprocess the data or modify the code logic for downscale in the library as dicussed in https://github.com/nerfstudio-project/nerfstudio/issues/1438.

Fastest Solution change the two line here to:

self.height = torch.floor(0.5 + (self.height * scaling_factor)).to(torch.int64)
self.width = torch.floor(0.5 + (self.width * scaling_factor)).to(torch.int64)
Pioneer6gun9 commented 3 months ago

Excuse me, I followed the above method, but I ran into an unsolvable problem when executing ns-train zipnerf --data bicycle colmap --colmap-path sparse/0RuntimeError: CUDA error: device-side assert triggered Is there any good way to solve it

Jing1Ling commented 3 months ago

Excuse me, I followed the above method, but I ran into an unsolvable problem when executing ns-train zipnerf --data bicycle colmap --colmap-path sparse/0RuntimeError: CUDA error: device-side assert triggered Is there any good way to solve it

Hi @Pioneer6gun9! Someone reminded me that the rounding strategy of mipnerf360 is not ceil but round. I've updated the code above. Btw, I've submit a pull request for nerfstudio for this issue. I'm not sure if this is the reason, feel free to contact me if you still have any questions.

Pioneer6gun9 commented 3 months ago

Thank you for your reply. I will try it again, it may be my side of the problem.  

xxx @.***

 

------------------ 原始邮件 ------------------ 发件人: "Ling @.>; 发送时间: 2024年4月8日(星期一) 下午4:53 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [SuLvXiangXin/zipnerf-pytorch] add support for nerfstudio (PR #98)

Excuse me, I followed the above method, but I ran into an unsolvable problem when executing ns-train zipnerf --data bicycle colmap --colmap-path sparse/0 : RuntimeError: CUDA error: device-side assert triggered Is there any good way to solve it

Hi @Pioneer6gun9! Someone reminded me that the rounding strategy of mipnerf360 is not ceil but round. I've updated the code above. Btw, I've submit a pull request for nerfstudio for this issue. I'm not sure if this is the reason, feel free to contact me if you still have any questions.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

unanan commented 3 months ago

I mentioned this situation in README.md:

*Nerfstudio's ColmapDataParser rounds down the image size when downscaling, which is different from the 360_v2 dataset.You can use nerfstudio to reprocess the data or modify the code logic for downscale in the library as dicussed in nerfstudio-project/nerfstudio#1438.

Fastest Solution change the two line here to:

self.height = torch.floor(0.5 + (self.height * scaling_factor)).to(torch.int64)
self.width = torch.floor(0.5 + (self.width * scaling_factor)).to(torch.int64)

I modify the codes here, and resolve the problem:

dataparser=ColmapDataParserConfig(downscale_factor=4,orientation_method="up",center_method="poses", colmap_path="sparse/0"),

to

dataparser=ColmapDataParserConfig(downscale_factor=4,orientation_method="up",center_method="poses", colmap_path="sparse/0", downscale_rounding_mode="round"),
Jing1Ling commented 3 months ago

Hi @unanan, you're right. Now that the PR submitted to nerfstudio about rounding mode has been merged. I will submit a PR to update the readme of this repo later. You can also specify the rounding mode when entering training instruction: ns-train zipnerf --data path/to/data colmap --downscale_rounding_mode round

s1eeveW commented 1 month ago

@Jing1Ling Hello mate, Do you have any solutions to resolve this issue?:

NameError: name 'segment_coo' is not defined

The entire error info is:

(nerfstudio) E:\zipnerf-pytorch>ns-train zipnerf --data ./data/flowers colmap --colmap-path sparse/0 E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\tyro_fields.py:343: UserWarning: The field colmap_path is annotated with type <class 'pathlib.Path'>, but the default value sparse/0 has type <class 'str'>. We'll try to handle this gracefully, but it may cause unexpected behavior. warnings.warn( [03:08:49] Using --data alias for --data.pipeline.datamanager.data train.py:230 ──────────────────────────────────────────────────────── Config ──────────────────────────────────────────────────────── TrainerConfig( _target=<class 'nerfstudio.engine.trainer.Trainer'>, output_dir=WindowsPath('outputs'), method_name='zipnerf', experiment_name=None, project_name='nerfstudio-project', timestamp='2024-05-28_030849', machine=MachineConfig(seed=42, num_devices=1, num_machines=1, machine_rank=0, dist_url='auto', device_type='cuda'), logging=LoggingConfig( relative_log_dir=WindowsPath('.'), steps_per_log=10, max_buffer_size=20, local_writer=LocalWriterConfig( _target=<class 'nerfstudio.utils.writer.LocalWriter'>, enable=True, stats_to_track=( <EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>, <EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>, <EventName.CURR_TEST_PSNR: 'Test PSNR'>, <EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>, <EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>, <EventName.ETA: 'ETA (time)'> ), max_log_size=10 ), profiler='basic' ), viewer=ViewerConfig( relative_log_filename='viewer_log_filename.txt', websocket_port=None, websocket_port_default=7007, websocket_host='0.0.0.0', num_rays_per_chunk=32768, max_num_display_images=512, quit_on_train_completion=False, image_format='jpeg', jpeg_quality=75, make_share_url=False, camera_frustum_scale=0.1, default_composite_depth=True ), pipeline=ZipNerfPipelineConfig( _target=<class 'zipnerf_ns.zipnerf_pipeline.ZipNerfPipeline'>, datamanager=ZipNerfDataManagerConfig( _target=<class 'zipnerf_ns.zipnerf_datamanager.ZipNerfDataManager'>, data=WindowsPath('data/flowers'), masks_on_gpu=False, images_on_gpu=False, dataparser=ColmapDataParserConfig( _target=<class 'nerfstudio.data.dataparsers.colmap_dataparser.ColmapDataParser'>, data=WindowsPath('.'), scale_factor=1.0, downscale_factor=4, downscale_rounding_mode='round', scene_scale=1.0, orientation_method='up', center_method='poses', auto_scale_poses=True, assume_colmap_world_coordinate_convention=True, eval_mode='interval', train_split_fraction=0.9, eval_interval=8, depth_unit_scale_factor=0.001, images_path=WindowsPath('images'), masks_path=None, depths_path=None, colmap_path=WindowsPath('sparse/0'), load_3D_points=True, max_2D_matches_per_3D_point=0 ), train_num_rays_per_batch=8192, train_num_images_to_sample_from=-1, train_num_times_to_repeat_images=-1, eval_num_rays_per_batch=8192, eval_num_images_to_sample_from=-1, eval_num_times_to_repeat_images=-1, eval_image_indices=(0,), collate_fn=<function nerfstudio_collate at 0x000001E2B32A3C10>, camera_res_scale_factor=1.0, patch_size=1, camera_optimizer=None, pixel_sampler=PixelSamplerConfig( _target=<class 'nerfstudio.data.pixel_samplers.PixelSampler'>, num_rays_per_batch=4096, keep_full_image=False, is_equirectangular=False, ignore_mask=False, fisheye_crop_radius=None, rejection_sample_mask=True, max_num_iterations=100 ) ), model=ZipNerfModelConfig( _target=<class 'zipnerf_ns.zipnerf_model.ZipNerfModel'>, enable_collider=True, collider_params={'near_plane': 2.0, 'far_plane': 6.0}, loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0}, eval_num_rays_per_chunk=32768, prompt=None, gin_file=['configs/360.gin'], compute_extras=True, proposal_weights_anneal_max_num_iters=1000, rand=True, zero_glo=False ) ), optimizers={ 'model': { 'optimizer': AdamOptimizerConfig( _target=<class 'torch.optim.adam.Adam'>, lr=0.008, eps=1e-15, max_norm=None, weight_decay=0 ), 'scheduler': ExponentialDecaySchedulerConfig( _target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>, lr_pre_warmup=1e-08, lr_final=0.001, warmup_steps=1000, max_steps=25000, ramp='cosine' ) } }, vis='viewer', data=WindowsPath('data/flowers'), prompt=None, relative_model_dir=WindowsPath('nerfstudio_models'), load_scheduler=True, steps_per_save=5000, steps_per_eval_batch=1000, steps_per_eval_image=5000, steps_per_eval_all_images=25000, max_num_iterations=25000, mixed_precision=True, use_grad_scaler=False, save_only_latest_checkpoint=True, load_dir=None, load_step=None, load_config=None, load_checkpoint=None, log_gradients=False, gradient_accumulation_steps={} ) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Saving config to: outputs\flowers\zipnerf\2024-05-28_030849\config.yml experiment_config.py:136 Saving checkpoints to: outputs\flowers\zipnerf\2024-05-28_030849\nerfstudio_models trainer.py:137 Setting up training dataset... Caching all 151 images. Setting up evaluation dataset... Caching all 22 images. E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torchmetrics\utilities\prints.py:62: FutureWarning: Importing PeakSignalNoiseRatio from torchmetrics was deprecated and will be removed in 2.0. Import PeakSignalNoiseRatio from torchmetrics.image instead. _future_warning( ╭─────────────── viser ───────────────╮ │ ╷ │ │ HTTP │ http://0.0.0.0:7007 │ │ Websocket │ ws://0.0.0.0:7007 │ │ ╵ │ ╰─────────────────────────────────────╯ [NOTE] Not running eval iterations since only viewer is enabled. Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval. No Nerfstudio checkpoint to load, so training from scratch. Disabled comet/tensorboard/wandb event writers Printing profiling stats, from longest to shortest duration in seconds VanillaPipeline.get_train_loss_dict: 0.2297 Trainer.train_iteration: 0.2297 Traceback (most recent call last): File "E:\Programming\Anaconda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\Programming\Anaconda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Programming\Anaconda\envs\nerfstudio\Scripts\ns-train.exe__main__.py", line 7, in File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint main( File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main launch( File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch main_func(local_rank=0, world_size=world_size, config=config) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop trainer.train() File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 261, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner out = func(*args, kwargs) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 496, in trainiteration , loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner out = func(*args, *kwargs) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\pipelines\base_pipeline.py", line 301, in get_train_loss_dict model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1 File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\models\base_model.py", line 143, in forward return self.get_outputs(ray_bundle) File "E:\zipnerf-pytorch\zipnerf_ns\zipnerf_model.py", line 94, in get_outputs renderings, ray_history = self.zipnerf( File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "E:\zipnerf-pytorch\internal\models.py", line 307, in forward loss_hash_decay = segment_coo(param ** 2, NameError: name 'segment_coo' is not defined

Jing1Ling commented 1 month ago

Hi @s1eeveW ! 'segment_coo‘ is a function in pytorch_scatter package. You can install pytorch_scatter in your python envirionment. Also, you can simply comment these lines and use this line to calculate 'loss_hash_decay'. They only have little difference and I think the replacement won't effect much thing.