NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.15k stars 224 forks source link

Torch tensor size mismatch in optimize_mesh func in train.py #83

Closed iraj465 closed 1 year ago

iraj465 commented 2 years ago

Hi, I have been stuck on this error for some time now, the optimize_mesh function is not functioning properly on CUDA I'm running a custom dataset with colmap transforms matrix generated from colmap2nerf script of instant-ngp.

Any help is appreciated

Resources used :


---------
config configs/nerf_shoe.json
iter 5000
batch 8
spp 1
layers 1
train_res [800, 800]
display_res [800, 800]
texture_res [2048, 2048]
display_interval 0
save_interval 100
learning_rate [0.03, 0.01]
min_roughness 0.08
custom_mip False
random_textures True
background white
loss logl1
out_dir out/nerf_shoe
ref_mesh data/shoe-2-nvdiff
base_mesh None
validate True
mtl_override None
dmtet_grid 128
mesh_scale 2.1
env_scale 1.0
envmap None
display [{'latlong': True}, {'bsdf': 'kd'}, {'bsdf': 'ks'}, {'bsdf': 'normal'}]
camera_space_light False
lock_light False
lock_pos False
sdf_regularizer 0.2
laplace relative
laplace_scale 3000
pre_load True
kd_min [0.0, 0.0, 0.0, 0.0]
kd_max [1.0, 1.0, 1.0, 1.0]
ks_min [0, 0.08, 0.0]
ks_max [1.0, 1.0, 1.0]
nrm_min [-1.0, -1.0, 0.0]
nrm_max [1.0, 1.0, 1.0]
cam_near_far [0.1, 1000.0]
learn_light True
local_rank 0
multi_gpu False
---------
NERF dataset path:  data/shoe-2-nvdiff/transforms_train.json
DatasetNERF: 143 images with shape [3024, 4032]
DatasetNERF: 143 images with shape [3024, 4032]
Encoder output: 32 dims
Traceback (most recent call last):
  File "train.py", line 595, in <module>
    geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate, 
  File "train.py", line 384, in optimize_mesh
    target = prepare_batch(target, 'random')
  File "/opt/users/saptarshi.majumder/tmp/miniconda3/envs/nvdiffrec/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "train.py", line 89, in prepare_batch
    target['img'] = torch.cat((torch.lerp(background, target['img'][..., 0:3], target['img'][..., 3:4]), target['img'][..., 3:4]), dim=-1)
RuntimeError: The size of tensor a (3) must match the size of tensor b (0) at non-singleton dimension 3```
jmunkberg commented 2 years ago

Hello, your train_res [800, 800] parameter must match the dataset image size, e.g., DatasetNERF: 143 images with shape [3024, 4032], so I would recommend to either increase train_res in the config, or scaling down your training images, for example, by a factor 3x or 4x to save memory.

With a batch size of 8, training in native res of [3024, 4032] will consume lots of memory so I would recommend scaling down your training data

iraj465 commented 2 years ago

Thanks @jmunkberg for this. Although, I'm getting the same error after i downscaled my images by 4x. What could be the reason for this?

jmunkberg commented 2 years ago

@iraj465 Just to double check, are you using the exact same resolution on train_res and your image sizes?

If you downscaled [3024,4032] to [756, 1008] then in the config, set "train_res": [756, 1008],

JHnvidia commented 2 years ago

@iraj465,

From the error message it also looks like your images are missing the alpha component. We assume that the dataset loader generates a 4-component output image (r,g,b,a) where alpha is supposed to contain the coverage mask (0 for background and 1 for object).

The dataloaders achieve this slightly differently, for the NeRF dataset reference images are (r,g,b,a) and are simply loaded. For NeRD, masks are separate images and are concatenated to the (r,g,b) color images.

iraj465 commented 2 years ago

Ah gotcha! It is missing the alpha channel. Thanks for this! How can i convert my images to images with alpha channel? Can you point to any leads or scripts for that? It would be really helpful.

JHnvidia commented 2 years ago

There is no trivial solution. We require a segmentation mask that is 1 for all pixels of the object and 0 for background.

For rendered datasets you can configure e.g. blender to generate this mask, but for photographs it requires manual work or relying on image segmentation AI such as rembg. See issue #58 for more information.