train by my own datasets

npcdna commented 12 months ago

Thanks for your awesome work. I want to apply it to generate camera img and lidar databy my own street datasets. My own datasets have lidar data ,img and pose, i want generate diff param lidar data and camera img; which data format should i use,
Which modules and scripts do I need to use？

ventusff commented 12 months ago

Hi, Some of the format descriptions are introduced here, you can start by processing your data into this format.

Our preprocess process on Waymo Open Dataset also follows the format above, only with certain specifications. You can refer to this script if your run into any problems. But it might not cover all of your problems since many procedures are specially designed for WOD.

Feel free to ask questions here, and I'm also working on a tutorial for training on custom datasets.

npcdna commented 11 months ago

hi,friend. i run waymo data sucessfully, Afterwards, I converted my data set to your format according to the instructions, and then adjusted my configuration according to the waymo configuration, but I failed to run it, whether it was adding lasers or using depth maps. Here is the problem:

2023-09-01 10:11:51,646-rk0-train.py#959:=> Start loading data, for experiment: logs/streetsurf/owndata_2 2023-09-01 10:11:51,646-rk0-train.py#962:=> Done loading data. 2023-09-01 10:11:51,647-rk0-checkpoint.py#74:=> Found ckpts: ['logs/streetsurf/owndata_2/ckpts/0.pt'] 2023-09-01 10:11:51,647-rk0-checkpoint.py#78:=> Loading checkpoint from local file: logs/streetsurf/owndata_2/ckpts/0.pt 2023-09-01 10:11:51,797-rk0-train.py#182:=> Start initialize prepcess... 2023-09-01 10:11:51,797-rk0-train.py#204:=> Done initialize prepcess. 2023-09-01 10:11:51,797-rk0-checkpoint.py#41:=> Saving ckpt to logs/streetsurf/owndata_2/ckpts/0.pt 2023-09-01 10:11:52,170-rk0-checkpoint.py#46:Done. 2023-09-01 10:11:52,170-rk0-train.py#1057:=> Start [train], it=0, lr=1e-05, in logs/streetsurf/owndata_2 0%| | 0/12000 [00:00<?, ?it/sError occurred in: logs/streetsurf/owndata_2
0%| | 0/12000 [00:04<?, ?it/s] Traceback (most recent call last): File "code_single/tools/train.py", line 1303, in main_function(bc.parse(print_config=is_master())) File "code_single/tools/train.py", line 1286, in main_function raise e File "code_single/tools/train.py", line 1278, in main_function train_step() File "code_single/tools/train.py", line 1109, in train_step ret, losses = trainer('pixel', sample, ground_truth, local_it, logger=logger) File "/home/tjh/.install/anaconda3/envs/nr3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "code_single/tools/train.py", line 244, in forward ret, losses = self.train_step_pixel(sample, ground_truth, it, logger=logger) File "code_single/tools/train.py", line 414, in train_step_pixel losses.update(self.eikonal_loss.forward_code_single( File "/home/tjh/Workspace/github/neuralsim/app/loss/eikonal.py", line 141, in forward_code_single occ_samples = obj.model.uniform_sample_on_occ(nablas[...,0].numel()) File "/home/tjh/Workspace/github/neuralsim/nr3d_lib/models/fields/neus/renderer_mixin.py", line 138, in uniform_sample_on_occ x = self.accel.occ.sample_pts_in_occupied(num_samples) # [-1,1] File "/home/tjh/.install/anaconda3/envs/nr3d/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/home/tjh/Workspace/github/neuralsim/nr3d_lib/models/spatial_accel/occgrid.py", line 233, in sample_pts_in_occupied assert gidx_nonempty.numel() > 0, "Occupancy grid becomes empty during training. Your model/algorithm/training settings might be incorrect. Please check configs and tensorboard." AssertionError: Occupancy grid becomes empty during training. Your model/algorithm/training settings might be incorrect. Please check configs and tensorboard.**

it seems config setting error, i did not know how to solve it, my img size: 1920*1280, Apart from the file path and using the camera laser type, are there any other parameters that need to be adjusted? here is my dataset format: ├── depths │ ├── camera_FRONT │ │ ├── 00000000.npz | | |... │ │ └── 00000187.npz │ └── camera_REAR │ ├── 00000000.npz | |.... │ └── 00000187.npz ├── images │ ├── camera_FRONT │ │ ├── 00000000.jpg | | |... │ │ └── 00000187.jpg │ └── camera_REAR │ ├── 00000000.jpg | |... │ └── 00000187.jpg ├── lidars │ └── lidar_TOP │ ├── 00000000.npz | |.... │ └── 00000187.npz ├── masks │ ├── camera_FRONT │ │ ├── 00000000.npz | | |... │ │ └── 00000187.npz │ └── camera_REAR │ ├── 00000000.npz | |... │ └── 00000187.npz ├── normals │ ├── camera_FRONT │ │ ├── 00000000.jpg | | .... │ │ └── 00000187.jpg │ └── camera_REAR │ ├── 00000000.jpg | .... │ └── 00000187.jpg └── scenario.pt

here is my config use lidar:

#------------------------------------------------------------
#------------    Some shortcut configs
#------------------------------------------------------------

device_ids: -1

num_rays_pixel: 4096
num_rays_lidar: 4096

near: 0.1
far: 200.0
depth_max: 120.0 # To visualize / colorize depth when render/eval
extend_size: 60.0
num_coarse: 128 # Number of coarse samples on each ray
step_size: 0.2 # Ray-marching step size
upsample_inv_s: 64.0
upsample_inv_s_factors: [1., 4., 16.]
num_fine: [8,8,32] # [8,8,8] # Number of samples of 3 upsample stages
radius_scale_min: 1 # Nearest sampling shell of NeRF++ background (Distant-view model)
radius_scale_max: 1000 # Furthest sampling shell of NeRF++ background (Distant-view model)
distant_interval_type: inverse_proportional
distant_mode: fixed_cuboid_shells
distant_nsample: 64

sdf_scale: 25.0 # The real-world length represented by one unit of SDF

rgb_fn: l1
rgb_fn_param: {}

lidar_fn: l1
lidar_fn_param: {}
w_lidar: 0.02
w_los: 0.1
# eps_los: annal1.5_0.75_0.5

# w_mask: 0.3

num_uniform: ${eval:"2**16"}

w_eikonal: 0.01
on_render_ratio: 0.1
on_occ_ratio: 1.0
on_render_type: both
safe_mse: true
errlim: 5

w_sparsity: 0.002
sparsity_anneal_for: 1000
sparsity_enable_after: 0

clbeta: 10.0
clw: 0.2
clearance_sdf: 0.02 # 0.02 * (sdf_scale=25) = 0.5m

num_iters: 15000
warmup_steps: 2000
min_factor: 0.06
fglr: 1.0e-2
bglr: 1.0e-2
# skylr: 1.0e-3
emblr: 2.0e-2
image_embedding_dim: 4

start_it: 0
start_level: 2
stop_it: 4000
final_inv_s: 2400.
ctrl_start: 3000
lnini: 0.3 # !!! NOTE: A higher initial inv_s helps with disentanglement of cr/dv, especially for no mask settings

use_estimate_alpha: false

geo_init_method: pretrain_after_zero_out # pretrain

camera_list: [camera_FRONT]
# camera_list: [camera_SIDE_LEFT, camera_FRONT_LEFT, camera_FRONT, camera_FRONT_RIGHT, camera_SIDE_RIGHT]
lidar_list: [lidar_TOP]
lidar_weight: [0.1] # Will be normalized when using

#------------------------------------------------------------
#------------    Full configs
#------------------------------------------------------------
# exp_dir: logs/streetsurf_refactor/dbgfix4_nomask_withlidar_seg134763_${lidar_fn}=${w_lidar}_lnini=${lnini}_invs=${final_inv_s}_${ctrl_start}_sdfscale=${sdf_scale}_wsp=${w_sparsity}_for=${sparsity_anneal_for}_wlos=${w_los}_eps=${eps_los}_weik=${w_eikonal}_on=${on_render_type}_onocc=${on_occ_ratio}_a=${on_render_ratio}_stlv=${start_level}_ini262144_softplus_stop=${stop_it}_cl=${clw}_${clbeta}_${clearance_sdf}_ego2.0
# exp_parent_dir: logs/final_waymo_multiseq_exp4.36_withmask_withlidar_15k_cuboid_half_ext${extend_size}_${rgb_fn}_${lidar_fn}=${w_lidar}_med=${discard_median}_1_02_2_joint
exp_dir: logs/streetsurf/owndata_1

dataset_cfg:
  target: dataio.autonomous_driving.WaymoDataset
  param:
    # root: /nvme/guojianfei/waymo/processed/
    root: /home/tjh/Workspace/tjh/neuralsim/testData/
    # root: /home/ventus/datasets/waymo/processed/
    # root: ./data/waymo/processed/
    rgb_dirname: images
    lidar_dirname: lidars
    mask_dirname: masks

scenebank_cfg:
  # NOTE: scene_id[,start_frame[,n_frames]]
  scenarios:
    - hd24319, 0, 186

  observer_cfgs: 
    Camera:
      list: ${camera_list}
    RaysLidar:
      list: ${lidar_list}
  on_load:
    no_objects: true # Set to true to skip loading foreground objects into scene graph
    joint_camlidar: true # !!! Convinient for NVS
    align_orientation: true
    consider_distortion: true
    joint_camlidar_equivalent_extr: true

assetbank_cfg:
  Street:
    model_class: app.models.single.LoTDNeuSStreet
    model_params:
      dtype: half
      var_ctrl_cfg:
        ln_inv_s_init: ${lnini}
        ln_inv_s_factor: 10.0
        ctrl_type: mix_linear
        start_it: ${ctrl_start}
        stop_it: ${training.num_iters}
        final_inv_s: ${final_inv_s}
      cos_anneal_cfg: null
      surface_cfg:
        sdf_scale: ${sdf_scale}
        encoding_cfg:
          lotd_use_cuboid: true
          lotd_auto_compute_cfg:
            type: ngp
            target_num_params: ${eval:"32*(2**20)"} # 64 MiB float16 params -> 32 Mi params
            min_res: 16
            n_feats: 2
            log2_hashmap_size: 20
            max_num_levels: null
          param_init_cfg:
            method: uniform_to_type
            bound: 1.0e-4
          anneal_cfg:
            type: hardmask
            start_it: ${start_it}
            start_level: ${start_level} # (need to be small: so the training is stable; not too small, so there's still valid initialize pretraining.)
            stop_it: ${stop_it} # Not for too much iters; should end very soon to not hinder quality
        decoder_cfg: 
          type: mlp
          D: 1
          W: 64
          # select_n_levels: 14
          activation:
            type: softplus
            beta: 100.0
        n_rgb_used_output: 0
        geo_init_method: ${geo_init_method}
      radiance_cfg:
        use_pos: true
        use_view_dirs: true
        dir_embed_cfg: 
          type: spherical
          degree: 4
        D: 2
        W: 64
        n_appear_embedding: ${image_embedding_dim}
      use_tcnn_backend: false
      accel_cfg:
        type: occ_grid
        vox_size: 1.0
        # resolution: [64,64,64]
        occ_val_fn_cfg:
          type: sdf
          inv_s: 256.0 # => +- 0.01 sdf @ 0.3 thre
        occ_thre: 0.3
        ema_decay: 0.95
        init_cfg:
          mode: from_net
          num_steps: 4
          num_pts: ${eval:"2**20"}
        acquire_from_net_cfg:
          num_steps: 4
          num_pts: ${eval:"2**20"}
        acquire_from_samples_cfg: {}
        n_steps_between_update: 16
        n_steps_warmup: 256
      ray_query_cfg:
        query_mode: march_occ_multi_upsample_compressed
        # query_mode: march_occ_multi_upsample
        query_param:
          nablas_has_grad: true
          num_coarse: ${num_coarse}
          num_fine: ${num_fine}
          coarse_step_cfg:
            step_mode: linear
          march_cfg:
            step_size: ${step_size} # Typical value: (far-near) / 4000
            max_steps: 4096
          upsample_inv_s: ${upsample_inv_s}
          upsample_inv_s_factors: ${upsample_inv_s_factors}
          upsample_use_estimate_alpha: ${use_estimate_alpha}
    asset_params:
      initialize_cfg: 
        target_shape: road_surface
        obs_ref: camera_FRONT # Reference observer. Its trajectory will be used for initialization.
        lr: 1.0e-3
        num_iters: 1000
        num_points: 262144
        w_eikonal: 3.0e-3
        floor_dim: z
        floor_up_sign: 1
        ego_height: 2.0
      preload_cfg: {}
      populate_cfg:
        extend_size: ${extend_size}
  Distant:
    model_class: app.models.single.LoTDNeRFDistant
    model_params:
      dtype: half
      encoding_cfg:
        input_ch: 4
        lotd_use_cuboid: true
        lotd_auto_compute_cfg:
          type: ngp4d
          target_num_params: ${eval:"16*(2**20)"} # 16 Mi params
          min_res_xyz: 16
          min_res_w: 4
          n_feats: 2
          log2_hashmap_size: 19
          per_level_scale: 1.382
        param_init_cfg:
          method: uniform_to_type
          bound: 1.0e-4
        # anneal_cfg:
        #   type: hardmask
        #   start_it: ${start_it}
        #   start_level: ${bg_start_level} # (need to be small: so the training is stable; not too small, so there's still valid initialize pretraining.)
        #   stop_it: ${stop_it} # Not for too much iters; should end very soon to not hinder quality
      extra_pos_embed_cfg:
        type: identity
      sigma_decoder_cfg: 
        type: mlp
        D: 1
        W: 64
        output_activation: softplus
      radiance_decoder_cfg:
        use_pos: false
        # pos_embed_cfg:
        #   type: identity
        use_view_dirs: false
        # dir_embed_cfg:
        #   type: spherical
        #   degree: 4
        use_nablas: false
        D: 2
        W: 64
        n_appear_embedding: ${image_embedding_dim}
      n_rgb_used_output: 0
      use_tcnn_backend: false
      include_inf_distance: true # !!! no sky
      radius_scale_min: ${radius_scale_min}
      radius_scale_max: ${radius_scale_max}
      ray_query_cfg:
        query_mode: march_occ
        query_param:
          march_cfg:
            interval_type: ${distant_interval_type}
            sample_mode: ${distant_mode}
            max_steps: ${distant_nsample}
    asset_params:
      populate_cfg:
        cr_obj_classname: Street
  # Sky:
  #   model_class: app.models.env.SimpleSky
  #   model_params: 
  #     dir_embed_cfg:
  #       type: sinusoidal
  #       n_frequencies: 10
  #       use_tcnn_backend: false
  #     D: 2
  #     W: 256
  #     use_tcnn_backend: false
  #     n_appear_embedding: ${image_embedding_dim}
  ImageEmbeddings:
    model_class: app.models.scene.ImageEmbeddings
    model_params:
      dims: ${image_embedding_dim}
      weight_init: uniform
      weight_init_std: 1.0e-4
  #--- Pose refine related
  LearnableParams:
    model_class: app.models.scene.LearnableParams
    model_params:
      refine_ego_motion: true
      # ego_node_id: ego_car
      ego_class_name: Camera
      refine_camera_intr: false
      refine_camera_extr: false
      enable_after: 500

renderer:
  common:
    with_env: false # !!! no sky
    with_rgb: true
    with_normal: true
    near: ${near} # NOTE: Critical to scene scale!
    far: ${far}
  train:
    depth_use_normalized_vw: false # For meaningful depth supervision (if any)
    perturb: true
  val:
    depth_use_normalized_vw: true # For correct depth rendering
    perturb: false
    rayschunk: 4096

training:
  #---------- Dataset and sampling
  dataloader:
    preload: true
    preload_on_gpu: false
    tags:
      camera:
        downscale: 1
        list: ${camera_list}
      # rgb_mask: {}
      # rgb_human_mask: {}
      # rgb_ignore_mask:
      #   ignore_not_occupied: false
      #   ignore_dynamic: false
      #   ignore_human: true
      lidar:
        list: ${lidar_list}
        multi_lidar_merge: true
        filter_when_preload: true
        filter_kwargs:
          filter_in_cams: true
    pixel_dataset:
      #---------- Frame and pixel dataloader
      joint: false
      equal_mode: ray_batch
      num_rays: ${num_rays_pixel}
      frame_sample_mode: uniform
      pixel_sample_mode: error_map
      error_map_res: [32,32]
      uniform_sampling_fraction: 0.5
      #---------- Joint frame-pixel dataloader
      # joint: true
      # equal_mode: ray_batch
      # num_rays: ${num_rays_pixel}
      # error_map_res: [32,32]
      # uniform_sampling_fraction: 0.5
    lidar_dataset:
      equal_mode: ray_batch
      num_rays: ${num_rays_lidar}
      frame_sample_mode: uniform
      lidar_sample_mode: merged_weighted
      multi_lidar_weight: ${lidar_weight} # Will be normalized when used
  val_dataloader:
    preload: false
    tags:
      camera:
        downscale: 4
        list: ${camera_list}
      # rgb_mask: {}
      # rgb_human_mask: {}
      # rgb_ignore_mask:
      #   ignore_not_occupied: false
      #   ignore_dynamic: false
      #   ignore_human: true
      lidar: ${training.dataloader.tags.lidar}
    image_dataset:
      camera_sample_mode: all_list # !!!
      frame_sample_mode: uniform

  #---------- Training losses
  uniform_sample: ${num_uniform}
  losses:
    rgb: 
      fn_type: ${rgb_fn}
      fn_param: ${rgb_fn_param}
      # respect_ignore_mask: true
    # occupancy_mask:
    #   w: ${w_mask}
    #   w_on_errmap: 0
    #   safe_bce: true
    #   pred_clip: 0
    mask_entropy:
      w: 0.005
      mode: crisp_cr
      enable_after: 2000
      anneal:
        type: linear
        start_it: 2000
        stop_it: 5000
        start_val: 0
        stop_val: 0.005
        update_every: 100
    lidar:
      discard_outliers: 0
      discard_outliers_median: 100.0
      discard_toofar: 80.0
      depth: 
        w: ${w_lidar}
        fn_type: ${lidar_fn}
        fn_param: ${lidar_fn_param}
      line_of_sight:
        w: ${w_los}
        fn_type: neus_unisim
        fn_param:
          # epsilon: ${eps_los}
          epsilon_anneal: 
            type: milestones
            milestones: [5000, 10000]
            vals: [1.5, 0.75, 0.5]
    eikonal:
      safe_mse: ${safe_mse}
      safe_mse_err_limit: ${errlim}
      alpha_reg_zero: 0
      on_occ_ratio: ${on_occ_ratio}
      on_render_type: ${on_render_type}
      on_render_ratio: ${on_render_ratio}
      class_name_cfgs:
        Street:
          w: ${w_eikonal}
    sparsity:
      enable_after: ${sparsity_enable_after}
      class_name_cfgs:
        Street:
          key: sdf
          type: normalized_logistic_density
          inv_scale: 16.0
          w: ${w_sparsity}
          anneal:
            type: linear
            start_it: ${sparsity_enable_after}
            start_val: 0
            stop_it: ${eval:"${sparsity_anneal_for}+${sparsity_enable_after}"}
            stop_val: ${w_sparsity}
            update_every: 100
    clearance:
      class_name_cfgs:
        Street:
          w: ${clw}
          beta: ${clbeta}
          thresh: ${clearance_sdf}
    weight_reg:
      class_name_cfgs:
        Street:
          norm_type: 2.0
          w: 1.0e-6
        Distant:
          norm_type: 2.0
          w: 1.0e-6

  optim:
    default: 1.0e-3
    # Sky: ${skylr}
    Distant:
      lr: ${bglr}
      eps: 1.0e-15
      betas: [0.9, 0.99]
    Street: 
      lr: ${fglr}
      eps: 1.0e-15
      betas: [0.9, 0.991]
      invs_betas: [0.9, 0.999]
    ImageEmbeddings: ${emblr}
    #--- Pose refine related
    LearnableParams: 
      ego_motion:
        lr: 0.001
        alpha_lr_rotation: 0.05

  num_iters: ${num_iters}
  scheduler:
    #---------- exponential
    type: exponential_step
    num_iters: ${training.num_iters}
    min_factor: ${min_factor}
    warmup_steps: ${warmup_steps}
    #---------- cosine
    # type: warmupcosine
    # num_iters: ${training.num_iters}
    # min_factor: ${min_factor}
    # warmup_steps: ${warmup_steps}
    #---------- milestone
    # type: multistep
    # milestones: [20000, 30000]
    # gamma: 0.33

  #---------- Logging and validation
  i_val: 1500      # unit: iters
  i_backup: -1 # unit: iters
  i_save: 900     # unit: seconds
  i_log: 20
  log_grad: false
  log_param: false

  ckpt_file: null
  ckpt_ignore_keys: []
  ckpt_only_use_keys: null

ventusff commented 11 months ago

Hi, Occupancy grids getting to all empty at the first training iter would likely due to the dataset being incorrectly configured. This can include many things, most likely is the world scaling or the extend_size.

I have just updated a debug tool for this. In order to use it, you can pass --debug_scene=true to the train.py. Remember to git pull and git submodule update --init --recursive to update the repo first.

After hitting "play" and "pause" again, you will see a popped up window like below, showing the lidar points (colored pointclouds), extracted occupancy grids (grey voxels), the street's AABB box (the large bold green box), the street's local coordinate axis (the large RGB arrows attached to the corner of the large green box), the camera frustums (colored frustums which are moving if you hit "play" again)

https://github.com/PJLab-ADG/neuralsim/assets/25529198/fd0a374d-d2d9-4721-be84-8cab913701ad

You can check whether the AABB box is correctly created, whether ego_car and camers is above the occupancy grid surface, etc. You can also hit play again to check whether all the lidar frames are loaded correctly and whether the street's AABB box can contain all the lidar points (within the camera viewports).

Apart from above, you can also zoom in to check whether the camera views are correctly, e.g. if they are incorrectly upside-down etc.

https://github.com/PJLab-ADG/neuralsim/assets/25529198/d39562dc-971d-49e4-9c96-363a4287f429

npcdna commented 11 months ago

thanks!

npcdna commented 11 months ago

i found i input erroring pose in scenario.pt.

MaRongbo commented 11 months ago

@ventusff Dear author, how to use the vis tools in remote server's docker? when I add --debug_scene=true in train.py, I get a coredump.

blackmrb commented 4 months ago

@ventusff Dear author, how to use the vis tools in remote server's docker? when I add --debug_scene=true in train.py, I get a coredump.

hello，could you pleasure the remote visulizaiton methed, Many thanks! I need it to debug my data. @ventusff

PJLab-ADG / neuralsim

train by my own datasets #9