NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.13k stars 223 forks source link

Pending on xatlas_uvmap #75

Open aerok opened 2 years ago

aerok commented 2 years ago

Hello, Recently I used nvdiffrec to train my dataset, but the program is pending after computing MSE and PSNR for 2 hours, and I found it is pending on xatlas_uvmap() function, is there any solution?

Hardware: RTX 3090(24G), CUDA 11.6. OS: Ubuntu 20.04

nvidia-smi

Wed Sep 14 16:52:42 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   42C    P8    32W / 420W |  17259MiB / 24576MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1079      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      6186      G   /usr/lib/xorg/Xorg                113MiB |
|    0   N/A  N/A      6350      G   /usr/bin/gnome-shell               26MiB |
|    0   N/A  N/A    115614      G   /usr/lib/firefox/firefox           92MiB |
|    0   N/A  N/A    447877    C+G   python                          16947MiB |
+-----------------------------------------------------------------------------+

config

config configs/nerf_cup.json
iter 5000
batch 8
spp 1
layers 1
train_res [540, 960]
display_res [540, 960]
texture_res [2048, 2048]
display_interval 0
save_interval 100
learning_rate [0.03, 0.01]
min_roughness 0.08
custom_mip False
random_textures True
background white
loss logl1
out_dir out/nerf_cup
ref_mesh data/nerf_mlabs/cup
base_mesh None
validate True
mtl_override None
dmtet_grid 128
mesh_scale 2.1
env_scale 1.0
envmap None
display [{'latlong': True}, {'bsdf': 'kd'}, {'bsdf': 'ks'}, {'bsdf': 'normal'}]
camera_space_light False
lock_light False
lock_pos False
sdf_regularizer 0.2
laplace relative
laplace_scale 3000
pre_load True
kd_min [0.0, 0.0, 0.0, 0.0]
kd_max [1.0, 1.0, 1.0, 1.0]
ks_min [0, 0.08, 0.0]
ks_max [1.0, 1.0, 1.0]
nrm_min [-1.0, -1.0, 0.0]
nrm_max [1.0, 1.0, 1.0]
cam_near_far [0.1, 1000.0]
learn_light True
local_rank 0
multi_gpu False

log

iter= 4760, img_loss=0.023038, reg_loss=0.014464, lr=0.00335, time=552.8 ms, rem=2.21 m
iter= 4770, img_loss=0.022198, reg_loss=0.014459, lr=0.00333, time=553.6 ms, rem=2.12 m
iter= 4780, img_loss=0.024802, reg_loss=0.014467, lr=0.00332, time=551.4 ms, rem=2.02 m
iter= 4790, img_loss=0.025234, reg_loss=0.014462, lr=0.00330, time=555.5 ms, rem=1.94 m
iter= 4800, img_loss=0.022134, reg_loss=0.014464, lr=0.00329, time=557.8 ms, rem=1.86 m
iter= 4810, img_loss=0.017345, reg_loss=0.014459, lr=0.00327, time=559.3 ms, rem=1.77 m
iter= 4820, img_loss=0.020818, reg_loss=0.014459, lr=0.00326, time=558.2 ms, rem=1.67 m
iter= 4830, img_loss=0.025523, reg_loss=0.014463, lr=0.00324, time=556.3 ms, rem=1.58 m
iter= 4840, img_loss=0.020200, reg_loss=0.014454, lr=0.00323, time=558.9 ms, rem=1.49 m
iter= 4850, img_loss=0.022266, reg_loss=0.014466, lr=0.00321, time=556.3 ms, rem=1.39 m
iter= 4860, img_loss=0.022666, reg_loss=0.014460, lr=0.00320, time=560.0 ms, rem=1.31 m
iter= 4870, img_loss=0.019655, reg_loss=0.014456, lr=0.00318, time=556.0 ms, rem=1.20 m
iter= 4880, img_loss=0.021475, reg_loss=0.014452, lr=0.00317, time=557.4 ms, rem=1.11 m
iter= 4890, img_loss=0.018814, reg_loss=0.014459, lr=0.00315, time=554.4 ms, rem=1.02 m
iter= 4900, img_loss=0.022151, reg_loss=0.014457, lr=0.00314, time=559.0 ms, rem=55.90 s
iter= 4910, img_loss=0.020598, reg_loss=0.014457, lr=0.00313, time=559.4 ms, rem=50.35 s
iter= 4920, img_loss=0.019100, reg_loss=0.014457, lr=0.00311, time=576.2 ms, rem=46.10 s
iter= 4930, img_loss=0.021694, reg_loss=0.014452, lr=0.00310, time=564.4 ms, rem=39.50 s
iter= 4940, img_loss=0.023046, reg_loss=0.014453, lr=0.00308, time=565.6 ms, rem=33.94 s
iter= 4950, img_loss=0.019468, reg_loss=0.014452, lr=0.00307, time=558.5 ms, rem=27.93 s
iter= 4960, img_loss=0.016885, reg_loss=0.014446, lr=0.00305, time=559.0 ms, rem=22.36 s
iter= 4970, img_loss=0.021190, reg_loss=0.014451, lr=0.00304, time=558.1 ms, rem=16.74 s
iter= 4980, img_loss=0.019300, reg_loss=0.014449, lr=0.00303, time=561.8 ms, rem=11.24 s
iter= 4990, img_loss=0.021410, reg_loss=0.014449, lr=0.00301, time=557.4 ms, rem=5.57 s
iter= 5000, img_loss=0.023075, reg_loss=0.014454, lr=0.00300, time=551.6 ms, rem=0.00 s
Running validation
MSE,      PSNR
0.04814737, 13.382

py-spy

Total Samples 18400
GIL: 100.00%, Active: 100.00%, Threads: 2

  %Own   %Total  OwnTime  TotalTime  Function (filename)
100.00% 100.00%   184.0s    184.0s   xatlas_uvmap (train.py)
  0.00% 100.00%   0.000s    184.0s   decorate_context (torch/autograd/grad_mode.py)
  0.00% 100.00%   0.000s    184.0s   <module> (train.py)
sanskar107 commented 2 years ago

I am facing exactly same issue. It was working few days ago, and suddenly it hangs after printing MSE and PSNR.

sadexcavator commented 2 years ago

I am facing exactly same issue. It was working few days ago, and suddenly it hangs after printing MSE and PSNR.

me same here

jmunkberg commented 2 years ago

Hello,

After the first pass, we run xatlas to create a UV parameterization on the triangle mesh. If the first pass failed to create a reasonable mesh, this step can take quite some time or even fail. How does the mesh look in your case at the end of the first pass if you look at the images dumped in the training folder? If the mesh quality is a triangle soup, it is very hard to compute a good UV parametrization. In the log above, a PSNR of 13.38 dB indicates a very low quality reconstruction, so if possible, increase the number of training images, and double-check that the camera poses and foreground segmentation masks are accurate to increase the quality of the reconstruction.

See also issue #13

PerspectivesLab commented 1 year ago

another option would be to test the same setup : mesh with UVAtlas https://github.com/microsoft/UVAtlas in order to see if the generated mesh has holes, or incompatible duplicate vertices..