OPEN-AIR-SUN / mars

MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving
Apache License 2.0
681 stars 64 forks source link

how to train a model with nerfacto using depth supervision #38

Closed sonnefred closed 1 year ago

sonnefred commented 1 year ago

Hi, I'd like to train a model from scratch using depth supervision generated from the monocular depth estimation model, and my cicai_config.py is like this, is it right? Thanks!

sonnefred commented 1 year ago

Sorry for the above confusing layout, and my cicai_configs.py is like this, thanks.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)
JiantengChen commented 1 year ago

If you want to use monocular depth estimation for KITTI, please add mono_depth_loss_mult in the SceneGraphModelConfig. You can also modify the parameters yourself.

CLICK ME

```python KITTI_Recon_NSG_Car_Depth = MethodSpecification( config=TrainerConfig( method_name="nsg-kitti-car-depth-recon", steps_per_eval_image=STEPS_PER_EVAL_IMAGE, steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES, steps_per_save=STEPS_PER_SAVE, max_num_iterations=MAX_NUM_ITERATIONS, save_only_latest_checkpoint=False, mixed_precision=False, use_grad_scaler=True, log_gradients=True, pipeline=NSGPipelineConfig( datamanager=NSGkittiDataManagerConfig( dataparser=NSGkittiDataParserConfig( scale_factor=0.01, use_car_latents=False, use_depth=True, split_setting="reconstruction", ), train_num_rays_per_batch=4096, eval_num_rays_per_batch=4096, camera_optimizer=CameraOptimizerConfig(mode="off"), ), model=SceneGraphModelConfig( mono_depth_loss_mult=0.05, depth_loss_mult=0, background_model=NerfactoModelConfig(), object_model_template=NerfactoModelConfig(), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ), ), optimizers={ "background_model": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "learnable_global": { "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, "object_model": { "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15), "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000), }, }, # viewer=ViewerConfig(num_rays_per_chunk=1 << 15), vis="wandb", ), description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.", ) ```

sonnefred commented 1 year ago

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

JiantengChen commented 1 year ago

Thanks for your quick reply, and I also have a question: when I use cicai_render.py to render images or videos, what should I modify if I only want to render the background (remove the objects)? Thanks!

Hi! You can refer to #33, maybe it's helpful to you.

sonnefred commented 1 year ago

Ok, btw, I generated depth maps using monocular depth estimation model, and I put the visualization images, which are 3-channel, into the completion_02 folder, is there any required processing before putting them in the folder? Thanks

JiantengChen commented 1 year ago

Below is an example image that we generated with a monocular depth estimation model. image

sonnefred commented 1 year ago

So your depth map is one-channel? Did you transfer the 3-channel depth map generated from the model to one channel?

sonnefred commented 1 year ago

But the depth map I generated is a color map, not black and white, should I transfer it into grayscale?

JiantengChen commented 1 year ago

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale. image

zwlvd commented 1 year ago

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . 00106

sonnefred commented 1 year ago

Hi! You can refer to the below code, which reads our depth from the image. And you need to transfer your image into grayscale. image

ok, thanks a lot, i will have a look.

JiantengChen commented 1 year ago

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . 00106

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

zwlvd commented 1 year ago

I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . 00106

You can try with our proposed category-level car model. That will help to decouple the object and background.

Dose it mean without the depth the result will be bad

Sure.

Thank you for your reply. Is the below model the category-level car model ?

model=SceneGraphModelConfig( background_model=NerfactoModelConfig(), object_model_template=CarNeRFModelConfig(_target=CarNeRF), object_representation="class-wise", object_ray_sample_strategy="remove-bg", ),

zwlvd commented 1 year ago

Below is an example image that we generated with a monocular depth estimation model. image

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel. 000151 What's the problem?

JiantengChen commented 1 year ago
CLICK ME

> > > I train the kitti 0006 without depth ,and the objects in the scene are just some shadow. Dose it mean without the depth the result will be bad . ![00106](https://user-images.githubusercontent.com/74552396/259690764-8110223b-a09b-4d3d-8293-ba115160a25e.png) > > > > > > You can try with our proposed category-level car model. That will help to decouple the object and background. > > > Dose it mean without the depth the result will be bad > > > > > > Sure. > > Thank you for your reply. Is the below model the category-level car model ? > > > model=SceneGraphModelConfig( > > background_model=NerfactoModelConfig(), > > object_model_template=CarNeRFModelConfig(_target=CarNeRF), > > object_representation="class-wise", > > object_ray_sample_strategy="remove-bg", > > ),

Sure.

JiantengChen commented 1 year ago

@zwlvd Hi. Thanks for your reply. You can change as the below image and try again. image

JiantengChen commented 1 year ago

For more information about KITTI depth maps, you all can refer to #18.

zwlvd commented 1 year ago

For more information about KITTI depth maps, you all can refer to #18.

Thank you for your valuable suggestions, it's very useful.

sonnefred commented 1 year ago

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

wuzirui commented 1 year ago

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

sonnefred commented 1 year ago

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)
sonnefred commented 1 year ago

Hi, I trained the model using monocular depth, but the depth loss is like this, which didn't decrease stably, and the eval depth image is like the following. Could you please point out what the problem may be? Thanks. c05fcabbee9d5ee79c3d67e53 91eb880b0baa6c186ba4b7fb9

Hi! What multipliers do you apply to the mono_depth_loss and the general depth_loss?

I trained the model using this config.

KITTI_Recon_NSG_Car_Depth = MethodSpecification(
    config=TrainerConfig(
        method_name="nsg-kitti-car-depth-recon",
        steps_per_eval_image=STEPS_PER_EVAL_IMAGE,
        steps_per_eval_all_images=STEPS_PER_EVAL_ALL_IMAGES,
        steps_per_save=STEPS_PER_SAVE,
        max_num_iterations=MAX_NUM_ITERATIONS,
        save_only_latest_checkpoint=False,
        mixed_precision=False,
        use_grad_scaler=True,
        log_gradients=True,
        pipeline=NSGPipelineConfig(
            datamanager=NSGkittiDataManagerConfig(
                dataparser=NSGkittiDataParserConfig(
                    scale_factor=0.01,
                    use_car_latents=False,
                    use_depth=True,
                    split_setting="reconstruction",
                ),
                train_num_rays_per_batch=4096,
                eval_num_rays_per_batch=4096,
                camera_optimizer=CameraOptimizerConfig(mode="off"),
            ),
            model=SceneGraphModelConfig(
                mono_depth_loss_mult=0.05,
                depth_loss_mult=0,
                background_model=NerfactoModelConfig(),
                object_model_template=NerfactoModelConfig(),
                object_representation="class-wise",
                object_ray_sample_strategy="remove-bg",
            ),
        ),
        optimizers={
            "background_model": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "learnable_global": {
                "optimizer": RAdamOptimizerConfig(lr=1e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
            "object_model": {
                "optimizer": RAdamOptimizerConfig(lr=5e-3, eps=1e-15),
                "scheduler": ExponentialDecaySchedulerConfig(lr_final=1e-5, max_steps=200000),
            },
        },
        # viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
        vis="wandb",
    ),
    description="Neural Scene Graph implementation with vanilla-NeRF model for backgruond and object models.",
)

Hi, do you have any suggestion about this problem, I'm a bit confused. When I used monocular depth loss, the training result is even worse than without depth supervision ... Thanks in advance.

wuzirui commented 1 year ago

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

sonnefred commented 1 year ago

Hi, we think there's a visualization problem with the depth colormap. Could you please check the values of the predicted depths?

Hi, I checked the values of the predicted depths. I generated the depth map following these codes, and the generated map is a 3-channel black-and-white picture and the pixel value is between 0-255. Is it right? depth_gen

sonnefred commented 1 year ago

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks. 图片

wuzirui commented 1 year ago

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks. 图片

whatever format the depth maps you load will be transformed into a single-channel float tensor

sonnefred commented 1 year ago

Hi, I saw this part in the code, does this mean the depth image should be a one-channel image? And what range of the pixel values should be? Thanks. 图片

whatever format the depth maps you load will be transformed into a single-channel float tensor

Ok, thanks a lot

AIBUWAN commented 1 year ago

2023-09-01-10:54 @sonnefred Hi, I loaded the depth map as a single-channel float tensor, but I still have the same problem, the mono depth loss won't go down, do you have any solution for this?

amoghskanda commented 7 months ago

Below is an example image that we generated with a monocular depth estimation model. image

I follow the omnidata to generate the depth ,and I notice the channel is setted to be 1 .While the result I got is still 3-channel. 000151 What's the problem?

Same case here. My depth maps are 3-channel as well. Should you change the code anywhere or can you start training with depth maps being 3-channel?