NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.09k stars 222 forks source link

Train resolution complain #88

Closed iraj465 closed 1 year ago

iraj465 commented 1 year ago

Hi, I have resized all my images to 768 x 1024 and set appropriate train_res yet there is a complain of dimension mismatch

config configs/manual/shoe-1.json                                                                                                                                                                                                                    
iter 500                                                                                                                                                                                                                                             
batch 8                                                                                                                                                                                                                                              
spp 1                                                                                                                                                                                                                                                
layers 1                                                                                                                                                                                                                                             
train_res [768, 1024]                                                                                                                                                                                                                                
display_res [1024, 1024]                                                                                                                                                                                                                              
texture_res [2048, 2048]                                                                                                                                                                                                                             
display_interval 0                                                                                                                                                                                                                                   
save_interval 100                                                                                                                                                                                                                                    
learning_rate [0.03, 0.03]                                                                                                                                                                                                                           
min_roughness 0.08                                                                                                                                                                                                                                   
custom_mip False                                                                                                                                                                                                                                     
random_textures True                                                                                                                                                                                                                                 
background white                                                                                                                                                                                                                                     
loss logl1                                                                                                                                                                                                                                           
out_dir out/manual_shoe-1                                                                                                                                                                                                                            
ref_mesh /opt/users/saptarshi.majumder/tmp/camera-ds/colmap-1024/shoe-1                                                                                                                                                                              
base_mesh None                                                                                                                                                                                                                                       
validate True                                                                                                                                                                                                                                        
mtl_override None                                                                                                                                                                                                                                    
dmtet_grid 64                                                                                                                                                                                                                                        
mesh_scale 2.8                                                                                                                                                                                                                                       
env_scale 1.0                                                                                                                                                                                                                                        
envmap None                                                                                                                                                                                                                                          
display [{'bsdf': 'kd'}, {'bsdf': 'ks'}, {'bsdf': 'normal'}]                                                                                                                                                                                         
camera_space_light True                                                                                                                                                                                                                              
lock_light False                                                                                                                                                                                                                                     
lock_pos False                                                                                                                                                                                                                                       
sdf_regularizer 0.2                                                                                                                                                                                                                                  
laplace relative                                                                                                                                                                                                                                     
laplace_scale 10000.0                                                                                                                                                                                                                                
pre_load True                                                                                                                                                                                                                                        
kd_min [0.03, 0.03, 0.03]                                                                                                                                                                                                                            
kd_max [0.8, 0.8, 0.8]                                                                                                                                                                                                                               
ks_min [0, 0.08, 0]                                                                                                                                                                                                                                  
ks_max [0, 1.0, 1.0]                                                                                                                                                                                                                                 
nrm_min [-1.0, -1.0, 0.0]                                                                                                                                                                                                                            
nrm_max [1.0, 1.0, 1.0]                                                                                                                                                                                                                              
cam_near_far [0.1, 1000.0]                                                                                                                                                                                                                           
learn_light True                                                                                                                                                                                                                                     
local_rank 0                                                                                                                                                                                                                                         
multi_gpu False                                                                                                                                                                                                                                      
---------    
DatasetLLFF: 125 images with shape [1024, 768]                                                                                                                                                                                                       
DatasetLLFF: auto-centering at [ 2.2330067  -0.27440363  2.6534197 ]                                                                                                                                                                                 
DatasetLLFF: 125 images with shape [1024, 768]                                                                                                                                                                                                       
DatasetLLFF: auto-centering at [ 2.2330067  -0.27440363  2.6534197 ]                                                                                                                                                                                 
Encoder output: 32 dims                                                                                                                                                                                                                              
Using /opt/users/saptarshi.majumder/tmp/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...                                                                                                                                             
Detected CUDA files, patching ldflags                                                                                                                                                                                                                
Emitting ninja build file /opt/users/saptarshi.majumder/tmp/.cache/torch_extensions/py38_cu113/renderutils_plugin/build.ninja...                                                                                                                     
Building extension module renderutils_plugin...                                                                                                                                                                                                      
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)                                                                                                                                    
ninja: no work to do.                                                                                                                                                                                                                                
Loading extension module renderutils_plugin...                                                                                                                                                                                                       
iter=    0, img_loss=0.320162, reg_loss=0.334069, lr=0.02999, time=836.7 ms, rem=6.97 m                                                                                                                                                              
Traceback (most recent call last):                                                                                                                                                                                                                   
  File "train.py", line 599, in <module>                                                                                                                                                                                                             
    geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate,                                                                                                                                                        
  File "train.py", line 381, in optimize_mesh                                                                                                                                                                                                        
    for it, target in enumerate(dataloader_train):                                                                                                                                                                                                   
  File "/opt/users/saptarshi.majumder/tmp/miniconda3/envs/nvdiffrec/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__                                                                                               
    data = self._next_data()                                                                                                                                                                                                                         
  File "/opt/users/saptarshi.majumder/tmp/miniconda3/envs/nvdiffrec/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data                                                                                             
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/users/saptarshi.majumder/tmp/miniconda3/envs/nvdiffrec/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/opt/users/saptarshi.majumder/tmp/camera-ds/nvdiffrec/dataset/dataset.py", line 33, in collate
    'img' : torch.cat(list([item['img'] for item in batch]), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 1024 but got size 768 for tensor number 1 in the list.              

Also, is there a way to ensure all images are of the same resolution as it expects in the train_res parameter (keeping the aspect ratio intact) ?

JHnvidia commented 1 year ago

Hi, it seems you've flipped the order for the display_res parameter (used to render validation images). I think that's the issue.

train_res [1024, 768]                                                                                                                                                                                                                                
display_res [768, 1024]                                                                                                                                                                                                                              

Those parameters are pretty clunky, and a leftover from when we always used to render our reference images. It might be possible to override them in the dataset loader (DatasetLLFF in this case) by just setting them to the resolution of the dataset.

I would recommend against rescaling images to match, as this wouldn't be consistent with camera field-of-view and will break training.

iraj465 commented 1 year ago

Okay, i understand. Makes sense.

As far as the resizing is concerned, if i don't resize i find that COLMAP reconstruction is very poor or less number of images are registered if original resolution is used. Is there any workaround for this? Using high resolution and not get stuck at COLMAP reconstruction because as far as training is concerned, the model is able to train to pretty high resolutions i find