facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.
MIT License
956 stars 62 forks source link

Errors occurring during the execution of the train. py #52

Closed curryandklay closed 2 months ago

curryandklay commented 3 months ago

When I entered the command, the message from the terminal was as follows:

Namespace(L1_weight=0.01, N_voxel_final=262144000, N_voxel_init=262144, TV_weight_app=0.0, TV_weight_density=0.0, add_frames_every=100, alpha_mask_thre=0.0001, batch_size=4096, ckpt=None, config=None, data_dim_color=27, datadir='/data/liuchen/localrf/data/katwijk', density_shift=-5, device='cuda:0', distance_scale=25, downsampling=-1, fea2denseAct='softplus', fea_pe=0, featureC=128, fov=66.0, frame_step=1, logdir='/data/liuchen/localrf/log/katwijk', loss_depth_weight_inital=0.1, loss_flow_weight_inital=1, lr_R_init=0.005, lr_basis=0.001, lr_decay_target_ratio=0.1, lr_exposure_init=0.001, lr_i_init=0, lr_init=0.02, lr_t_init=0.0005, lr_upsample_reset=1, max_drift=1, model_name='TensorVMSplit', nSamples=1000000.0, n_init_frames=5, n_iters_per_frame=600, n_iters_reg=100, n_lamb_sh=[24, 24, 24], n_lamb_sigma=[8, 8, 8], n_max_frames=100, n_overlap=30, pos_pe=0, prog_speedup_factor=1.0, progress_refresh_rate=200, refinement_speedup_factor=1.0, render_from_file='', render_only=0, render_path=1, render_test=1, rm_weight_mask_thre=0.001, shadingMode='MLP_Fea_late_view', skip_TB_images=False, skip_saving_video=False, step_ratio=0.5, subsequence=[0, -1], test_frame_every=10, update_AlphaMask_list=[100, 200, 300], upsamp_list=[100, 150, 200, 250, 300], view_pe=0, vis_every=10000, with_preprocessed_poses=0) lc_min: tensor([-2., -2., -2.], device='cuda:0') lc_max: tensor([2., 2., 2.], device='cuda:0') n_novels: 262144 xyz_max: tensor([2., 2., 2.], device='cuda:0') xyz_min: tensor([-2., -2., -2.], device='cuda:0') n_voxels: 262144 Traceback (most recent call last): File "localTensoRF/train.py", line 661, in reconstruction(args) File "localTensoRF/train.py", line 265, in reconstruction reso_cur = N_to_reso(args.N_voxel_init, aabb) File "/data/liuchen/localrf/localTensoRF/utils/utils.py", line 205, in N_to_reso voxel_size = ((xyz_max - xyz_min).prod() / n_voxels).pow(1 / 3)

The above message is followed by a RuntimeError message in the terminal, which is very lengthy, but the following prompt appears in the last part:

extern "C" __launch_bounds(512, 4) global__ void reduction_prod_kernel(ReduceJitOp r){ r.run(); } nvrtc: error: invalid value for --gpu-architecture (-arch)

I print the values of _xyzmax and _xyzmin to see what the problem is, but I find that they are computable, so I can't find what the problem is. Can you explain what's happening here, please? BTW, my environment configuration is pytorch 1.12.0 + CUDA11.3 and GPU is NVIDIA 4090. Any advice would be appreciated!

curryandklay commented 2 months ago

This bug is solved by the correct version of CUDA as well as torch. The project needs to run strictly according to the version stated by the author, my previous version was relatively low.