chenhsuanlin / photometric-mesh-optim

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction :globe_with_meridians: (CVPR 2019)
MIT License
208 stars 25 forks source link

Runtime Error when encoding a Video Frame #11

Closed CILT closed 3 years ago

CILT commented 3 years ago

Hi, I've tested your work with a trained model, as showed in the examples. Everything worked fine using the sequences provided in data/sequences.

Now, I'm trying to see what happens if I give a sequence created by myself, from a real video. I have encoded the video in RGB, and then I created the .npy file from it. When executing the project with the next parameters, I get a Runtime Error: RuntimeError: mat1 dim 1 must match mat2 dim 0.

The execution parameters are:

model="pretrained/02958343_atl25.npz"
seq_path="../Khashkar_videos/sequences"
name="khashkars"

python3 main.py \
    --load=${model} --code=5e-2 --scale=2e-2 --lr-pmo=3e-3 --noise=0.1 --video \
    --init-idx=0 --seq-path=${seq_path} --name=${name}

Then, the full stack trace:

=======================================================
main.py (photometric mesh optimization)
=======================================================
setting configurations...
H : 224
W : 224
aug_transl : None
avg_frame : False
batch_size : 32
batch_size_pmo : -1
category : None
code : 0.05
cpu : False
device : cuda:0
eval : False
from_epoch : 0
gpu : 0
group : 0
imagenet_enc : False
init_idx : 0
load : pretrained/02958343_atl25.npz
log_tb : False
log_visdom : False
lr_decay : 1.0
lr_pmo : 0.003
lr_pretrain : 0.0001
lr_step : 100
name : khashkars_seed0
noise : 0.1
num_meshgrid : 5
num_points : 100
num_points_all : 2500
num_prim : 25
num_workers : 8
pointcloud_path : data/customShapeNet
pretrained_dec : None
rendering_path : data/rendering
scale : 0.02
seed : 0
seq_path : ../Khashkar_videos/sequences
sfm : False
size : 224x224
sphere : False
sphere_densify : 3
sun360_path : data/background
to_epoch : 500
to_it : 100
video : True
vis_port : 8097
vis_server : http://localhost

reading list of sequences...
building AtlasNet...
loading checkpoint pretrained/02958343_atl25.npz...
======= OPTIMIZATION START =======
loading sequence...
reading RGB .npy file...
reading ground-truth camera .npz file...
Traceback (most recent call last):
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/main.py", line 29, in <module>
    pmo.setup_variables(opt)
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/model.py", line 44, in setup_variables
    self.code_init = self.network.encoder.forward(input_image[None]).detach()
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/atlasnet.py", line 192, in forward
    x = self.fc(x)
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/PMO/lib64/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/PMO/lib64/python3.9/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/cllullt/.../Photometric_Mesh_Optim/photometric-mesh-optim/PMO/lib64/python3.9/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 dim 1 must match mat2 dim 0

I'll appreciate any feedback, thanks.

CILT commented 3 years ago

I must add that the error seems being raised when encoding the frame with atlasnet.py

chenhsuanlin commented 3 years ago

Hi @CILT, this looks like a shape mismatch error. You may want to make sure the size of input_image in this line is 3xHxW.

CILT commented 3 years ago

Hi! Thank you for quick response.

I've executed the following command: print(input_image.shape) and outputs: torch.Size([3, 1440, 1080])

With 3xHxW, are you meaning dimensions 3xHeightxWidth? Currently, my video is 3xWidthxHeight.

Should I transpose my RGB .npy array, then?

Thank you.

chenhsuanlin commented 3 years ago

Please make sure the input image size matches opt.H and opt.W, as this looks like the main problem. And yes, it's probably a good idea to make it 3 x height x width. Please feel free to reopen if the issue persists.