OPEN-AIR-SUN / mars

MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving
Apache License 2.0
681 stars 64 forks source link

Why are there no vehicles in the rendering results? #33

Closed Jinming-Su closed 1 year ago

Jinming-Su commented 1 year ago

Hi, @xBeho1der

Great work.

Following your rendering script,

python scripts/cicai_render.py --load-config outputs/nvs75fullseq/nsg-vkitti-car-depth-nvs/2023-06-21_135412/config.yml --output-path renders/

I get the result:

https://github.com/OPEN-AIR-SUN/mars/assets/16068384/9027b408-1c74-48de-920e-fc47d3b2a8b1

Why are there no vehicles in the rendering results?

JiantengChen commented 1 year ago

@Jinming-Su Hi! Thanks for your feedback! I've just run a render script locally on the master branch of our repo, using the config and checkpoint I uploaded. My command is python scripts/cicai_render.py --load-config outputs/nvs75fullseq/nsg-vkitti-car-depth-nvs/2023-06-21_135412/config.yml --output-path renders/ and below is one frame of the render result. image Actually, you are not the first one that point out this issue to us. But we are not able to reappear this situation locally. I suggest that you change the output_format in cicai_render.py to images and try again. image

JiantengChen commented 1 year ago

The following is the rendered video I just generated. Would you please provide your cicai_config.py and cicai_render.py for a further check?

https://github.com/OPEN-AIR-SUN/mars/assets/107318439/3e5e8555-729f-49c2-b70e-099b1285bbd4

Jinming-Su commented 1 year ago

The following is the rendered video I just generated. Would you please provide your cicai_config.py and cicai_render.py for a further check?

output.mp4

Thanks for your reply.
I don't modify cicai_config.py and cicai_render.py. I just modify the data path in your provided config.yml in outputs.

The content of config.yml is as follows:

CLICK ME

``` !!python/object:nerfstudio.engine.trainer.TrainerConfig _target: !!python/name:nerfstudio.engine.trainer.Trainer '' data: &id003 !!python/object/apply:pathlib.PosixPath - / - mnt - data - public_dataset - nerf - vkitti - Scene06 - clone experiment_name: nvs75fullseq load_checkpoint: null load_config: null load_dir: null load_step: null log_gradients: true logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig _target: !!python/name:nerfstudio.utils.writer.LocalWriter '' enable: true max_log_size: 10 stats_to_track: !!python/tuple - !!python/object/apply:nerfstudio.utils.writer.EventName - Train Iter (time) - !!python/object/apply:nerfstudio.utils.writer.EventName - Train Rays / Sec - !!python/object/apply:nerfstudio.utils.writer.EventName - Test PSNR - !!python/object/apply:nerfstudio.utils.writer.EventName - Vis Rays / Sec - !!python/object/apply:nerfstudio.utils.writer.EventName - Test Rays / Sec - !!python/object/apply:nerfstudio.utils.writer.EventName - ETA (time) max_buffer_size: 20 profiler: basic relative_log_dir: !!python/object/apply:pathlib.PosixPath [] steps_per_log: 10 machine: !!python/object:nerfstudio.configs.base_config.MachineConfig dist_url: auto machine_rank: 0 num_gpus: 1 num_machines: 1 seed: 42 max_num_iterations: 600000 method_name: nsg-vkitti-car-depth-nvs mixed_precision: false optimizers: background_model: optimizer: !!python/object:nerfstudio.engine.optimizers.RAdamOptimizerConfig _target: &id001 !!python/name:torch.optim.radam.RAdam '' eps: 1.0e-15 lr: 0.001 max_norm: null weight_decay: 0 scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig _target: &id002 !!python/name:nerfstudio.engine.schedulers.ExponentialDecayScheduler '' lr_final: 1.0e-05 lr_pre_warmup: 1.0e-08 max_steps: 200000 ramp: cosine warmup_steps: 0 object_model: optimizer: !!python/object:nerfstudio.engine.optimizers.RAdamOptimizerConfig _target: *id001 eps: 1.0e-15 lr: 0.005 max_norm: null weight_decay: 0 scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig _target: *id002 lr_final: 1.0e-05 lr_pre_warmup: 1.0e-08 max_steps: 200000 ramp: cosine warmup_steps: 0 output_dir: !!python/object/apply:pathlib.PosixPath - outputs pipeline: !!python/object:nsg.nsg_pipeline.NSGPipelineConfig _target: !!python/name:nsg.nsg_pipeline.NSGPipeline '' datamanager: !!python/object:nsg.data.nsg_datamanager.NSGkittiDataManagerConfig _target: !!python/name:nsg.data.nsg_datamanager.NSGkittiDataManager '' camera_optimizer: !!python/object:nerfstudio.cameras.camera_optimizers.CameraOptimizerConfig _target: !!python/name:nerfstudio.cameras.camera_optimizers.CameraOptimizer '' mode: 'off' optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig _target: !!python/name:torch.optim.adam.Adam '' eps: 1.0e-15 lr: 0.0006 max_norm: null weight_decay: 0 orientation_noise_std: 0.0 param_group: camera_opt position_noise_std: 0.0 scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialDecaySchedulerConfig _target: *id002 lr_final: null lr_pre_warmup: 1.0e-08 max_steps: 10000 ramp: cosine warmup_steps: 0 camera_res_scale_factor: 1.0 data: *id003 dataparser: !!python/object:nsg.data.nsg_vkitti_dataparser.NSGvkittiDataParserConfig _target: !!python/name:nsg.data.nsg_vkitti_dataparser.NSGvkitti '' add_input_rows: -1 alpha_color: white bckg_only: false box_scale: 1.5 car_nerf_state_dict_path: !!python/object/apply:pathlib.PosixPath - / - mnt - model - 24_car_nerf - mars - vkitti - car-nerf-state-dict - epoch_805.ckpt car_object_latents_path: !!python/object/apply:pathlib.PosixPath - / - mnt - model - 24_car_nerf - mars - vkitti - car-object-latents - latent_codes06.pt chunk: 32768 data: !!python/object/apply:pathlib.PosixPath - / - mnt - data - public_dataset - nerf - vkitti - Scene06 - clone dataset_type: vkitti far_plane: 150.0 first_frame: 0 last_frame: 237 max_input_objects: -1 near_plane: 0.5 netchunk: 65536 novel_view: left obj_only: false obj_opaque: true object_setting: 0 render_only: false scale_factor: 0.1 scene_scale: 2.0 semantic_mask_classes: [] semantic_path: !!python/object/apply:pathlib.PosixPath [] split_setting: nvs-75 use_car_latents: true use_depth: true use_obj: true use_object_properties: true use_semantic: false eval_image_indices: !!python/tuple - 0 eval_num_images_to_sample_from: -1 eval_num_rays_per_batch: 4096 eval_num_times_to_repeat_images: -1 patch_size: 1 train_num_images_to_sample_from: -1 train_num_rays_per_batch: 4096 train_num_times_to_repeat_images: -1 model: !!python/object:nsg.models.scene_graph.SceneGraphModelConfig _target: !!python/name:nsg.models.scene_graph.SceneGraphModel '' background_color: last_sample background_model: !!python/object:nsg.models.nerfacto.NerfactoModelConfig _target: !!python/name:nsg.models.nerfacto.NerfactoModel '' background_color: last_sample collider_params: far_plane: 6.0 near_plane: 2.0 disable_scene_contraction: false distortion_loss_mult: 0.002 enable_collider: true eval_num_rays_per_chunk: 4096 far_plane: 150.0 hidden_dim: 64 hidden_dim_color: 64 hidden_dim_transient: 64 log2_hashmap_size: 19 loss_coefficients: rgb_loss_coarse: 1.0 rgb_loss_fine: 1.0 max_res: 2048 near_plane: 0.05 num_coarse_samples: 24 num_levels: 16 num_nerf_samples_per_ray: 97 num_proposal_iterations: 2 num_proposal_samples_per_ray: !!python/tuple - 256 - 128 obj_feat_dim: 0 orientation_loss_mult: 0.0001 pred_normal_loss_mult: 0.001 predict_normals: false proposal_net_args_list: - hidden_dim: 16 log2_hashmap_size: 17 max_res: 128 num_levels: 5 use_linear: false - hidden_dim: 16 log2_hashmap_size: 17 max_res: 256 num_levels: 5 use_linear: false proposal_update_every: 5 proposal_warmup: 5000 proposal_weights_anneal_max_num_iters: 1000 proposal_weights_anneal_slope: 10.0 sampler: proposal use_average_appearance_embedding: true use_gradient_scaling: false use_proposal_weight_anneal: true use_same_proposal_network: false use_single_jitter: true collider_params: far_plane: 6.0 near_plane: 2.0 debug_object_pose: false depth_loss_mult: 0.01 depth_loss_type: !!python/object/apply:nerfstudio.model_components.losses.DepthLossType - 1 depth_sigma: 0.05 distortion_loss_mult: 0.002 enable_collider: true eval_num_rays_per_chunk: 4096 far_plane: 1000.0 interlevel_loss_mult: 1.0 is_euclidean_depth: false latent_size: 256 loss_coefficients: rgb_loss_coarse: 1.0 rgb_loss_fine: 1.0 max_num_obj: -1 near_plane: 0.05 object_model_template: !!python/object:nsg.models.car_nerf.CarNeRFModelConfig _target: !!python/name:nsg.models.car_nerf.CarNeRF '' background_color: black collider_params: far_plane: 6.0 near_plane: 2.0 enable_collider: true eval_num_rays_per_chunk: 4096 loss_coefficients: rgb_loss_coarse: 1.0 rgb_loss_fine: 1.0 num_coarse_samples: 32 num_fine_samples: 97 optimize_latents: false object_ray_sample_strategy: remove-bg object_representation: class-wise object_warmup_steps: 1000 orientation_loss_mult: 0.0001 pred_normal_loss_mult: 0.001 predict_normals: false ray_add_input_rows: -1 should_decay_sigma: false sigma_decay_rate: 0.9998 starting_depth_sigma: 4.0 use_interlevel_loss: true project_name: nerfstudio-project relative_model_dir: !!python/object/apply:pathlib.PosixPath - nerfstudio_models save_only_latest_checkpoint: false steps_per_eval_all_images: 5000 steps_per_eval_batch: 500 steps_per_eval_image: 500 steps_per_save: 2000 timestamp: 2023-06-21_135412 use_grad_scaler: true viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig image_format: jpeg jpeg_quality: 90 max_num_display_images: 512 num_rays_per_chunk: 32768 quit_on_train_completion: false relative_log_filename: viewer_log_filename.txt websocket_host: 0.0.0.0 websocket_port: null websocket_port_default: 7007 vis: wandb ```

cicai_config.py is :

CLICK ME

``` from __future__ import annotations from pathlib import Path from typing import Dict import tyro from nsg.models.semantic_nerfw import SemanticNerfWModelConfig from nerfstudio.cameras.camera_optimizers import CameraOptimizerConfig from nerfstudio.engine.optimizers import RAdamOptimizerConfig from nerfstudio.engine.schedulers import ExponentialDecaySchedulerConfig from nerfstudio.engine.trainer import TrainerConfig from nerfstudio.plugins.types import MethodSpecification from nsg.data.nsg_datamanager import NSGkittiDataManagerConfig from nsg.data.nsg_dataparser import NSGkittiDataParserConfig from nsg.data.nsg_vkitti_dataparser import NSGvkittiDataParserConfig from nsg.models.car_nerf import CarNeRF, CarNeRFModelConfig from nsg.models.mipnerf import MipNerfModel from nsg.models.nerfacto import NerfactoModelConfig from nsg.models.scene_graph import SceneGraphModelConfig from nsg.models.vanilla_nerf import NeRFModel, VanillaModelConfig from nsg.nsg_pipeline import NSGPipelineConfig MAX_NUM_ITERATIONS = 600000 STEPS_PER_SAVE = 2000 STEPS_PER_EVAL_IMAGE = 500 STEPS_PER_EVAL_ALL_IMAGES = 5000 VKITTI_Recon_NSG_Car_Depth_Semantic = MethodSpecification( config=TrainerConfig( method_name="nsg-vkitti-car-depth-recon-semantic", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, max_num_iterations=MAX_NUM_ITERATIONS, save_only_latest_checkpoint=True, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, use_semantic=True, semantic_mask_classes=['Van', 'Undefined'], car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes02.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), semantic_path=Path("/data22/DISCOVER_summer2023/xiaohm2306/Scene02/clone/frames/classSegmentation") ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=SemanticNerfWModelConfig( num_proposal_iterations=1, num_proposal_samples_per_ray=[48], num_nerf_samples_per_ray=97, use_single_jitter=False, semantic_loss_weight=0.1 ), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph with semantic learning for the backgruond model.", ) KITTI_Recon_NSG_Car_Depth = MethodSpecification( config=TrainerConfig( method_name="nsg-kitti-car-depth-recon", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, max_num_iterations=MAX_NUM_ITERATIONS, save_only_latest_checkpoint=False, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/data1/chenjt/datasets/ckpts/pretrain/car_nerf/latent_codes_car_van_truck.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/data1/chenjt/datasets/ckpts/pretrain/car_nerf/epoch_670.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "learnable_global": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_no_depth_kitti = MethodSpecification( config=TrainerConfig( method_name="ablation-no-depth-kitti", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, max_num_iterations=MAX_NUM_ITERATIONS, save_only_latest_checkpoint=False, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGkittiDataParserConfig( use_car_latents=True, use_depth=False, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/latent_codes_car_van_truck.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/epoch_670.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "learnable_global": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) KITTI_NVS_NSG_Car_Depth = MethodSpecification( config=TrainerConfig( method_name="nsg-kitti-car-depth-nvs", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, max_num_iterations=MAX_NUM_ITERATIONS, save_only_latest_checkpoint=False, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGkittiDataParserConfig( use_car_latents=True, use_depth=False, car_object_latents_path=Path( "/data41/luoly/kitti_mot/latents/latent_codes06.pt" ), split_setting="nvs-75", car_nerf_state_dict_path=Path("/data1/chenjt/datasets/ckpts/pretrain/car_nerf/epoch_670.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=NerfactoModelConfig(), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "learnable_global": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) VKITTI_Recon_NSG_Car_Depth = MethodSpecification( config=TrainerConfig( method_name="nsg-vkitti-car-depth-recon", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" # "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes02.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) VKITTI_NVS_NSG_Car_Depth = MethodSpecification( config=TrainerConfig( method_name="nsg-vkitti-car-depth-nvs", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="nvs-75", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_BG_NeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-background-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=VanillaModelConfig( _target=NeRFModel, num_coarse_samples=32, num_importance_samples=64, ), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_BG_MipNeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-background-mip-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=VanillaModelConfig( _target=MipNerfModel, num_coarse_samples=48, num_importance_samples=96, ), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_object_wise_NeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-object-wise-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=VanillaModelConfig( _target=NeRFModel, num_coarse_samples=32, num_importance_samples=64, ), object_representation="object-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_object_wise_MipNeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-object-wise-mip-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=VanillaModelConfig( _target=MipNerfModel, num_coarse_samples=48, num_importance_samples=96, ), object_representation="object-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_object_wise_NeRFacto = MethodSpecification( config=TrainerConfig( method_name="ablation-object-wise-nerfacto", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=False, use_depth=True, split_setting="reconstruction", ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=NerfactoModelConfig(), object_representation="object-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_class_wise_NeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-class-wise-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=VanillaModelConfig( _target=NeRFModel, num_coarse_samples=32, num_importance_samples=64, ), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_class_wise_MipNeRF = MethodSpecification( config=TrainerConfig( method_name="ablation-class-wise-mip-nerf", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=VanillaModelConfig( _target=MipNerfModel, num_coarse_samples=48, num_importance_samples=96, ), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_class_wise_NeRFacto = MethodSpecification( config=TrainerConfig( method_name="ablation-class-wise-nerfacto", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=NerfactoModelConfig(), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_warmup = MethodSpecification( config=TrainerConfig( method_name="ablation-warmup", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="warmup", object_warmup_steps=5000, ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_none_ray_sample = MethodSpecification( config=TrainerConfig( method_name="ablation-none-ray-sample", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=True, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="none", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) Ablation_no_depth = MethodSpecification( config=TrainerConfig( method_name="ablation-no-depth", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, save_only_latest_checkpoint=False, max_num_iterations=MAX_NUM_ITERATIONS, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGvkittiDataParserConfig( use_car_latents=True, use_depth=False, car_object_latents_path=Path( "/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/latents/latent_codes06.pt" ), split_setting="reconstruction", car_nerf_state_dict_path=Path("/DATA_EDS/liuty/ckpts/pretrain/car_nerf/vkitti/epoch_805.ckpt"), ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) ```

cicai_render.py is:

CLICK ME

``` #!/usr/bin/env python """ render.py """ from __future__ import annotations import json import os from nerfstudio.utils import colormaps import struct import sys from contextlib import ExitStack from dataclasses import dataclass, field from pathlib import Path from typing import List, Optional import cv2 import mediapy as media import numpy as np import torch import tyro from rich.console import Console from rich.progress import ( BarColumn, Progress, TaskProgressColumn, TextColumn, TimeRemainingColumn, ) from typing_extensions import Literal, assert_never from nerfstudio.model_components.losses import normalized_depth_scale_and_shift from nerfstudio.cameras.camera_paths import get_path_from_json, get_spiral_path from nerfstudio.cameras.cameras import Cameras, CameraType from nerfstudio.pipelines.base_pipeline import Pipeline from nerfstudio.utils import install_checks from nerfstudio.utils.eval_utils import eval_setup from nerfstudio.utils.rich_utils import ItersPerSecColumn from nerfstudio.viewer.server.utils import three_js_perspective_camera_focal_length from nerfstudio.data.utils.data_utils import ( get_depth_image_from_path, ) CONSOLE = Console(width=120) def _render_trajectory_video( pipeline: Pipeline, cameras: Cameras, output_filename: Path, rendered_output_names: List[str], render_width: int, render_height: int, rendered_resolution_scaling_factor: float = 1.0, seconds: float = 5.0, output_format: Literal["images", "video"] = "video", camera_type: CameraType = CameraType.PERSPECTIVE, ) -> None: """Helper function to create a video of the spiral trajectory. Args: pipeline: Pipeline to evaluate with. cameras: Cameras to render. output_filename: Name of the output file. rendered_output_names: List of outputs to visualise. render_width: Video width to render. render_height: Video height to render. rendered_resolution_scaling_factor: Scaling factor to apply to the camera image resolution. seconds: Length of output video. output_format: How to save output data. camera_type: Camera projection format type. """ CONSOLE.print("[bold green]Creating trajectory " + output_format) cameras.rescale_output_resolution(rendered_resolution_scaling_factor) cameras = cameras.to(pipeline.device) fps = len(cameras) / seconds progress = Progress( TextColumn(":movie_camera: Rendering :movie_camera:"), BarColumn(), TaskProgressColumn(show_speed=True), ItersPerSecColumn(suffix="fps"), TimeRemainingColumn(elapsed_when_finished=True, compact=True), ) if output_format == "images": output_image_dir = output_filename.parent / output_filename.stem output_image_dir.mkdir(parents=True, exist_ok=True) if output_format == "video": # make the folder if it doesn't exist output_filename.parent.mkdir(parents=True, exist_ok=True) # NOTE: # we could use ffmpeg_args "-movflags faststart" for progressive download, # which would force moov atom into known position before mdat, # but then we would have to move all of mdat to insert metadata atom # (unless we reserve enough space to overwrite with our uuid tag, # but we don't know how big the video file will be, so it's not certain!) with ExitStack() as stack: writer = ( stack.enter_context( media.VideoWriter( path=output_filename, shape=( int(render_height * rendered_resolution_scaling_factor), int(render_width * rendered_resolution_scaling_factor) * len(rendered_output_names), ), fps=fps, ) ) if output_format == "video" else None ) with progress: for camera_idx in progress.track(range(cameras.size), description=""): objdata = pipeline.datamanager.train_dataset.metadata["obj_info"][camera_idx].to(pipeline.device) obj_metadata = pipeline.datamanager.eval_dataset.metadata["obj_metadata"].to(pipeline.device) camera_ray_bundle = cameras.generate_rays(camera_indices=camera_idx) camera_ray_bundle.metadata["object_rays_info"] = objdata # camera_ray_bundle.metadata["object_rays_metadata"] = obj_metadata # camera_ray_bundle = cameras.generate_rays( # camera_indices=camera_idx, # keep_shape=True, # objdata=objdata, # objmetadata=obj_metadata, # ) # batch_obj_rays = camera_ray_bundle.metadata["object_rays_info"].reshape( # camera_ray_bundle.metadata["object_rays_info"].shape[0], # camera_ray_bundle.metadata["object_rays_info"].shape[1], # int(camera_ray_bundle.metadata["object_rays_info"].shape[2] / 3), # 3, # ) batch_obj_dyn = camera_ray_bundle.metadata["object_rays_info"].view( camera_ray_bundle.metadata["object_rays_info"].shape[0], camera_ray_bundle.metadata["object_rays_info"].shape[1], pipeline.model.config.max_num_obj, pipeline.model.config.ray_add_input_rows * 3, ) norm_sh = camera_ray_bundle.metadata["directions_norm"].shape camera_ray_bundle.metadata["directions_norm"] = camera_ray_bundle.metadata["directions_norm"].reshape( norm_sh[0] * norm_sh[1], norm_sh[2] ) pose = batch_obj_dyn[..., :3] rotation = batch_obj_dyn[..., 3] pose[:, :, 0, 2] = pose[:, :, 0, 2] rotation[:, :, 0] = rotation[:, :, 0] batch_obj_dyn[..., :3] = pose batch_obj_dyn[..., 3] = rotation camera_ray_bundle.metadata["object_rays_info"] = batch_obj_dyn.reshape( batch_obj_dyn.shape[0] * batch_obj_dyn.shape[1], batch_obj_dyn.shape[2] * batch_obj_dyn.shape[3] ) # meta_sh = camera_ray_bundle.metadata["object_rays_metadata"].shape # camera_ray_bundle.metadata["object_rays_metadata"] = camera_ray_bundle.metadata[ # "object_rays_metadata" # ].reshape(meta_sh[0] * meta_sh[1], meta_sh[2]) with torch.no_grad(): outputs = pipeline.model.get_outputs_for_camera_ray_bundle_render(camera_ray_bundle) render_image = [] for rendered_output_name in rendered_output_names: if rendered_output_name not in outputs: CONSOLE.rule("Error", style="red") CONSOLE.print(f"Could not find {rendered_output_name} in the model outputs", justify="center") CONSOLE.print( f"Please set --rendered_output_name to one of: {outputs.keys()}", justify="center" ) sys.exit(1) if rendered_output_name == "depth": depth = outputs["depth"] filepath = pipeline.datamanager.train_dataparser_outputs.metadata["depth_filenames"][camera_idx] scale_factor = pipeline.datamanager.train_dataparser_outputs.dataparser_scale * 0.01 depth_img_gt = get_depth_image_from_path( filepath=filepath, height=render_height, width=render_width, scale_factor=scale_factor ) depth_mask = torch.abs(depth_img_gt / scale_factor - 65535) > 1e-6 depth_gt = depth_img_gt.to(depth) depth_gt = depth_gt * outputs["directions_norm"] depth[~depth_mask] = 0.0 max_depth = depth_img_gt.max() if pipeline.config.model.mono_depth_loss_mult > 1e-8: scale, shift = normalized_depth_scale_and_shift( outputs["depth"][None, ...], depth_gt[None, ...], depth_gt[None, ...] > 0.0 ) depth = depth * scale + shift depth[depth > max_depth] = max_depth outputs["depth"] = colormaps.apply_depth_colormap(depth) if rendered_output_name == "semantics": semantic_labels = torch.argmax( torch.nn.functional.softmax(outputs["semantics"], dim=-1), dim=-1 ) colormap = ( pipeline.model.object_meta["semantics"] .colors.clone() .detach() .to(outputs["semantics"].device) ) semantic_colormap = colormap[semantic_labels] outputs["semantics"] = semantic_colormap / 255.0 output_image = outputs[rendered_output_name].cpu().numpy() if output_image.shape[-1] == 1: output_image = np.concatenate((output_image,) * 3, axis=-1) render_image.append(output_image) render_image = np.concatenate(render_image, axis=1) if output_format == "images": media.write_image(output_image_dir / f"{camera_idx:05d}.png", render_image) if output_format == "video" and writer is not None: writer.add_image(render_image) if output_format == "video": if camera_type == CameraType.EQUIRECTANGULAR: insert_spherical_metadata_into_file(output_filename) def insert_spherical_metadata_into_file( output_filename: Path, ) -> None: """Inserts spherical metadata into MP4 video file in-place. Args: output_filename: Name of the (input and) output file. """ # NOTE: # because we didn't use faststart, the moov atom will be at the end; # to insert our metadata, we need to find (skip atoms until we get to) moov. # we should have 0x00000020 ftyp, then 0x00000008 free, then variable mdat. spherical_uuid = b"\xff\xcc\x82\x63\xf8\x55\x4a\x93\x88\x14\x58\x7a\x02\x52\x1f\xdd" spherical_metadata = bytes( """ equirectangular True True nerfstudio """, "utf-8", ) insert_size = len(spherical_metadata) + 8 + 16 with open(output_filename, mode="r+b") as mp4file: try: # get file size mp4file_size = os.stat(output_filename).st_size # find moov container (probably after ftyp, free, mdat) while True: pos = mp4file.tell() size, tag = struct.unpack(">I4s", mp4file.read(8)) if tag == b"moov": break mp4file.seek(pos + size) # if moov isn't at end, bail if pos + size != mp4file_size: # TODO: to support faststart, rewrite all stco offsets raise Exception("moov container not at end of file") # go back and write inserted size mp4file.seek(pos) mp4file.write(struct.pack(">I", size + insert_size)) # go inside moov mp4file.seek(pos + 8) # find trak container (probably after mvhd) while True: pos = mp4file.tell() size, tag = struct.unpack(">I4s", mp4file.read(8)) if tag == b"trak": break mp4file.seek(pos + size) # go back and write inserted size mp4file.seek(pos) mp4file.write(struct.pack(">I", size + insert_size)) # we need to read everything from end of trak to end of file in order to insert # TODO: to support faststart, make more efficient (may load nearly all data) mp4file.seek(pos + size) rest_of_file = mp4file.read(mp4file_size - pos - size) # go to end of trak (again) mp4file.seek(pos + size) # insert our uuid atom with spherical metadata mp4file.write(struct.pack(">I4s16s", insert_size, b"uuid", spherical_uuid)) mp4file.write(spherical_metadata) # write rest of file mp4file.write(rest_of_file) finally: mp4file.close() @dataclass class RenderTrajectory: """Load a checkpoint, render a trajectory, and save to a video file.""" # Path to config YAML file. load_config: Path # Name of the renderer outputs to use. rgb, depth, semantics etc. concatenates them along y axis rendered_output_names: List[str] = field(default_factory=lambda: ["rgb", "depth"]) # Trajectory to render. traj: Literal["spiral", "filename"] = "spiral" # Scaling factor to apply to the camera image resolution. downscale_factor: int = 1 # Filename of the camera path to render. camera_path_filename: Path = Path("camera_path.json") # Name of the output file. output_path: Path = Path("renders/output.mp4") # How long the video should be. seconds: float = 5.0 # How to save output data. output_format: Literal["images", "video"] = "video" # Specifies number of rays per chunk during eval. eval_num_rays_per_chunk: Optional[int] = None def main(self) -> None: """Main function.""" _, pipeline, _, _ = eval_setup( self.load_config, eval_num_rays_per_chunk=self.eval_num_rays_per_chunk, test_mode="inference", ) install_checks.check_ffmpeg_installed() seconds = self.seconds # TODO(ethan): use camera information from parsing args # if self.traj == "spiral": # camera_start = pipeline.datamanager.eval_dataloader.get_camera(image_idx=0).flatten() # # TODO(ethan): pass in the up direction of the camera # camera_type = CameraType.PERSPECTIVE # render_width = 952 # render_height = 736 # camera_path = get_spiral_path(camera_start, steps=30, radius=0.1) # elif self.traj == "filename": # with open(self.camera_path_filename, "r", encoding="utf-8") as f: # camera_path = json.load(f) # seconds = camera_path["seconds"] # if "camera_type" not in camera_path: # camera_type = CameraType.PERSPECTIVE # elif camera_path["camera_type"] == "fisheye": # camera_type = CameraType.FISHEYE # elif camera_path["camera_type"] == "equirectangular": # camera_type = CameraType.EQUIRECTANGULAR # else: # camera_type = CameraType.PERSPECTIVE # render_width = camera_path["render_width"] # render_height = camera_path["render_height"] # camera_path = get_path_from_json(camera_path) # else: # assert_never(self.traj) FOV = torch.tensor(([30, 26, 22]), dtype=torch.float32) # camera_path = pipeline.datamanager.eval_dataset.cameras camera_path = pipeline.datamanager.train_dataset.cameras render_width = int(camera_path.cx[0] * 2) render_height = int(camera_path.cy[0] * 2) seconds = 13 camera_type = CameraType.PERSPECTIVE # for i, fov in enumerate(FOV): # focal_length = three_js_perspective_camera_focal_length(fov, render_height) # camera_path.fx[i] = focal_length # camera_path.fy[i] = focal_length # cameras_a=Cameras( # fx=camera_path.fx[select_frame], # fy=camera_path.fy[select_frame], # cx=camera_path.image_width[select_frame] / 2, # cy=camera_path.image_height[select_frame] / 2, # camera_to_worlds=camera_to_worlds, # camera_type=camera_type, # # times=times, # ) _render_trajectory_video( pipeline, camera_path, output_filename=self.output_path, rendered_output_names=self.rendered_output_names, rendered_resolution_scaling_factor=1.0 / self.downscale_factor, seconds=seconds, output_format="images", # output_format=self.output_format, camera_type=camera_type, render_width=render_width, render_height=render_height, ) def entrypoint(): """Entrypoint for use with pyproject scripts.""" tyro.extras.set_accent_color("bright_yellow") tyro.cli(RenderTrajectory).main() if __name__ == "__main__": entrypoint() # For sphinx docs get_parser_fn = lambda: tyro.extras.get_parser(RenderTrajectory) # noqa ```

zwlvd commented 1 year ago

There is another thing confuses me that why the video repeat twice and each part is just about 5 to 6 seconds. The whole video is assumed to be over 10 seconds.

JiantengChen commented 1 year ago

There is another thing confuses me that why the video repeat twice and each part is just about 5 to 6 seconds. The whole video is assumed to be over 10 seconds.

Hi! KITTI and VKITTI are gathered from a binocular camera. You can set the rendering channel to images and select the images you want. By the way, you can also change the highlighted line below to specify images to render. image

Jinming-Su commented 1 year ago

The following is the rendered video I just generated. Would you please provide your cicai_config.py and cicai_render.py for a further check? output.mp4

Thanks for your reply. I don't modify cicai_config.py and cicai_render.py. I just modify the data path in your provided config.yml in outputs.

The content of config.yml is as follows:

CLICK ME cicai_config.py is : CLICK ME cicai_render.py is: CLICK ME

@xBeho1der I use the same environment and config with that you provide. But the rendering result is only the backgrount without cars.

Jinming-Su commented 1 year ago

@xBeho1der I find that

z_ray_in_o, z_ray_out_o, intersection_map = ray_box_intersection(rays_o_o, dirs_o)

the output if [None, None, None].

Jinming-Su commented 1 year ago

Hi @xBeho1der , what is the function of this code

data: !!python/object/apply:pathlib.PosixPath

and how to set the path?

wuzirui commented 1 year ago

Hi @xBeho1der , what is the function of this code

data: !!python/object/apply:pathlib.PosixPath

and how to set the path?

via specifying --data /path/to/your/data in the command line.

wuzirui commented 1 year ago

@xBeho1der I find that

z_ray_in_o, z_ray_out_o, intersection_map = ray_box_intersection(rays_o_o, dirs_o)

the output if [None, None, None].

do you mean that the 3 return values are all None in your experiments?

JiantengChen commented 1 year ago

Hi @xBeho1der , what is the function of this code

data: !!python/object/apply:pathlib.PosixPath

and how to set the path?

Hi, our path is generated with the library pathlib.

Jinming-Su commented 1 year ago

@xBeho1der I find that

z_ray_in_o, z_ray_out_o, intersection_map = ray_box_intersection(rays_o_o, dirs_o)

the output if [None, None, None].

do you mean that the 3 return values are all None in your experiments?

Yes

Jinming-Su commented 1 year ago

And also,

# ./nsg/models/scene_graph.py
obj_pose = self.batchify_object_pose(ray_bundle).to(self.device)
        # [x, y, z, yaw, track_id, length, width, height, class_id]

When I run cicai_render.py, the track_id of above is 0 and -1. Is this normal that track_id is -1?

Jinming-Su commented 1 year ago

Hi @xBeho1der ,

Thanks for your replay recently. I have solve this problem.

I have find the error that I have make. I modify the code

batch_obj_metadata = torch.index_select(obj_meta_tensor, 0, obj_idx.reshape(-1)).reshape(

to

batch_obj_metadata = torch.index_select(obj_meta_tensor.to(obj_idx), 0, obj_idx.reshape(-1)).reshape(

, which is a wrong operation. Because that the type of obj_idx is int64, but the type of obj_meta_tensor is float32.

Therefore, the right operation is that

batch_obj_metadata = torch.index_select(obj_meta_tensor.to(self.device), 0, obj_idx.reshape(-1)).reshape(

.

JiantengChen commented 1 year ago

Thanks for your attention and interest to our project. You can also try change as the below images, which is mentioned in the annotation of scene_graph.py. image