Tuo-Liang commented 8 months ago

I tried to run the script "bash dev_scripts/ete/dtu_dgt_d012_img0123_conf_color_dir_agg2.sh", but it gave back the following results. I just wonder why this occurred and how to solve. Please help me.

/home/lab710/anaconda3/envs/pointnerf/lib/python3.8/site-packages/numpy/core/shape_base.py:420: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. arrays = [asanyarray(arr) for arr in arrays] dataset total: train 30184 dataset [DtuDataset] was created

training images = 30184

No previous checkpoints, start from scratch!!!! opt.resume_dir ../checkpoints/init/init/dtu_dgt_d012_img0123_conf_color_dir_agg2 None opt.act_type!!!!!!!!! LeakyReLU querier device cuda:0 0 no neural points as nn.Parameter model [MvsPointsVolumetricModel] was created opt.resume_iter!!!!!!!!! None loading mvs from ../checkpoints/init/init/dtu_dgt_d012_img0123_conf_color_dir_agg2/None_net_mvs.pth cannot load ../checkpoints/init/init/dtu_dgt_d012_img0123_conf_color_dir_agg2/None_net_mvs.pth loading ray_marching from ../checkpoints/init/init/dtu_dgt_d012_img0123_conf_color_dir_agg2/None_net_ray_marching.pth cannot load ../checkpoints/init/init/dtu_dgt_d012_img0123_conf_color_dir_agg2/None_net_ray_marching.pth ------------------- Networks ------------------- [Network mvs] Total number of parameters: 0.041M [Network ray_marching] Total number of parameters: 0.397M

/home/lab710/anaconda3/envs/pointnerf/lib/python3.8/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] points_dirs torch.Size([1, 0, 3]) 1 0

Traceback (most recent call last): File "train.py", line 366, in main() File "train.py", line 264, in main model.optimize_parameters(total_steps=total_steps) File "/home/lab710/pointnerf/run/../models/neural_points_volumetric_model.py", line 215, in optimize_parameters self.forward() File "/home/lab710/pointnerf/run/../models/mvs_points_volumetric_model.py", line 123, in forward points_xyz, points_embedding, points_colors, points_dirs, points_conf = self.net_mvs(self.input) File "/home/lab710/anaconda3/envs/pointnerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/lab710/anaconda3/envs/pointnerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/lab710/pointnerf/run/../models/mvs/mvs_points_model.py", line 372, in forward points_features_lst = [self.query_embedding(HDWD, torch.as_tensor(cam_xyz_lst[i][None, ...], device="cuda", dtype=torch.float32), photometric_confidence_lst[i][None, ..., None], img_feats, data_mvs['c2ws'], data_mvs['w2cs'], batch["intrinsics"], int(self.args.depth_vid[i]), pointdir_w=False) for i in range(len(cam_xyz_lst))] File "/home/lab710/pointnerf/run/../models/mvs/mvs_points_model.py", line 372, in points_features_lst = [self.query_embedding(HDWD, torch.as_tensor(cam_xyz_lst[i][None, ...], device="cuda", dtype=torch.float32), photometric_confidence_lst[i][None, ..., None], img_feats, data_mvs['c2ws'], data_mvs['w2cs'], batch["intrinsics"], int(self.args.depth_vid[i]), pointdir_w=False) for i in range(len(cam_xyz_lst))] File "/home/lab710/pointnerf/run/../models/mvs/mvs_points_model.py", line 253, in query_embedding points_dirs = points_dirs.view(cam_xyz.shape[0], cam_xyz.shape[1], -1) RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous end loading

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem.

dev_scripts/ete/dtu_dgt_d012_img0123_conf_color_dir_agg2.sh: line 223: 41130 Aborted (core dumped) python3 train.py --experiment $name --data_root $data_root --dataset_name $dataset_name --model $model --which_render_func $which_render_func --which_blend_func $which_blend_func --out_channels $out_channels --num_pos_freqs $num_pos_freqs --num_viewdir_freqs $num_viewdir_freqs --random_sample $random_sample --random_sample_size $random_sample_size --batch_size $batch_size --maximum_step $maximum_step --lr $lr --lr_policy $lr_policy --lr_decay_iters $lr_decay_iters --gpu_ids $gpu_ids --checkpoints_dir $checkpoints_dir --save_iter_freq $save_iter_freq --niter $niter --niter_decay $niter_decay --n_threads $n_threads --pin_data_in_memory $pin_data_in_memory --train_and_test $train_and_test --test_num $test_num --test_freq $test_freq --test_num_step $test_num_step --test_color_loss_items $test_color_loss_items --print_freq $print_freq --bg_color $bg_color --split $split --which_ray_generation $which_ray_generation --near_plane $near_plane --far_plane $far_plane --dir_norm $dir_norm --which_tonemap_func $which_tonemap_func --load_points $load_points --resume_dir $resume_dir --resume_iter $resume_iter --feature_init_method $feature_init_method --agg_axis_weight $agg_axis_weight --agg_distance_kernel $agg_distance_kernel --radius_limit_scale $radius_limit_scale --depth_limit_scale $depth_limit_scale --vscale $vscale --kernel_size $kernel_size --SR $SR --K $K --P $P --NN $NN --agg_feat_xyz_mode $agg_feat_xyz_mode --agg_alpha_xyz_mode $agg_alpha_xyz_mode --agg_color_xyz_mode $agg_color_xyz_mode --save_point_freq $save_point_freq --raydist_mode_unit $raydist_mode_unit --agg_dist_pers $agg_dist_pers --agg_intrp_order $agg_intrp_order --shading_feature_mlp_layer1 $shading_feature_mlp_layer1 --shading_feature_mlp_layer2 $shading_feature_mlp_layer2 --shading_feature_mlp_layer3 $shading_feature_mlp_layer3 --shading_feature_num $shading_feature_num --dist_xyz_freq $dist_xyz_freq --shpnt_jitter $shpnt_jitter --shading_alpha_mlp_layer $shading_alpha_mlp_layer --shading_color_mlp_layer $shading_color_mlp_layer --which_agg_model $which_agg_model --num_feat_freqs $num_feat_freqs --dist_xyz_deno $dist_xyz_deno --apply_pnt_mask $apply_pnt_mask --point_features_dim $point_features_dim --color_loss_items $color_loss_items --color_loss_weights $color_loss_weights --feedforward $feedforward --trgt_id $trgt_id --depth_vid $depth_vid --ref_vid $ref_vid --manual_depth_view $manual_depth_view --pre_d_est $pre_d_est --depth_occ $depth_occ --manual_std_depth $manual_std_depth --visual_items $visual_items --appr_feature_str0 $appr_feature_str0 --appr_feature_str1 $appr_feature_str1 --appr_feature_str2 $appr_feature_str2 --appr_feature_str3 $appr_feature_str3 --act_type $act_type --point_conf_mode $point_conf_mode --point_dir_mode $point_dir_mode --point_color_mode $point_color_mode --depth_conf_thresh $depth_conf_thresh --geo_cnsst_num $geo_cnsst_num --bgmodel $bgmodel --vox_res $vox_res --debug

iszhihao commented 8 months ago

I have the same issue

Tuo-Liang commented 8 months ago

I have the same issue

do you know how to solve it ? I found that it might cause by the "cam_xyz" got 'nan". But I dont know how to solve it.

iszhihao commented 8 months ago

I have the same issue

do you know how to solve it ? I found that it might cause by the "cam_xyz" got 'nan". But I dont know how to solve it.

The problem has not been resolved yet. When I run other .sh files, such as chair.sh and lego.sh, I also encounter a "segmentation fault" error, which is driving me to the brink of frustration.

iszhihao commented 8 months ago

I have the same issue

do you know how to solve it ? I found that it might cause by the "cam_xyz" got 'nan". But I dont know how to solve it.

do you know how to solve it

Xharlie / pointnerf

RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous #102

training images = 30184

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem.