Open NagabhushanSN95 opened 3 years ago
This is the command I'm using to start the training
python snb/train.py --batch-size 4 --folder temp --num_workers 4 --resume --dataset scenenet --use_inv_z --accumulation alphacomposite --model_type zbuffer_pts --refine_model_type resnet_256W8UpDown64 --norm_G sync:spectral_batch --render_ids 1 --suffix '' --normalize_image --lr 0.0001 --use_gt_depth --W 240 --log-dir ../Runs/Training/Train01/%s
I wrote DataLoader for SceneNet based on KittiDataLoader. The code is as follows:
import math
from pathlib import Path
import numpy
import skimage.io
import skimage.transform
import torch
import torch.utils.data as data
class SceneNetDataLoader(data.Dataset):
def __init__(self, split_name, opts=None):
super(SceneNetDataLoader, self).__init__()
self.opt = opts
self.dataroot = Path(opts.dataset_path) / split_name
self.scenes = []
for scene_num in sorted(self.dataroot.iterdir()):
self.scenes.append((scene_num.stem, 0))
self.scenes.append((scene_num.stem, 3750))
@staticmethod
def get_image(path: Path):
image = skimage.io.imread(path.as_posix()).astype(numpy.float32) / 255 * 2 - 1
image = image[:, 40:280] # Crop (240,320,3) to (240,240,3)
image_tr = torch.from_numpy(image).permute((2, 0, 1))
return image_tr
@staticmethod
def get_depth(path: Path):
depth = skimage.io.imread(path.as_posix()) * 0.001
depth = depth[:, 40:280] # Crop (240,320,3) to (240,240,3)
depth = depth[None]
depth = depth.astype(numpy.float32)
return depth
def get_transformation(self, scene_num, view_num: int):
transformation_matrix_path = self.dataroot / scene_num / 'TransformationMatrix.txt'
transformation_matrices = numpy.genfromtxt(transformation_matrix_path.as_posix(), delimiter=',')
pose_index = view_num // 25
pose1 = transformation_matrices[pose_index].reshape(4, 4)
pose2 = transformation_matrices[pose_index + 1].reshape(4, 4)
trans = numpy.matmul(pose2, numpy.linalg.inv(pose1)).astype(numpy.float32)
return trans
@staticmethod
def camera_intrinsic_transform(vfov=45, hfov=60, pixel_width=320, pixel_height=240):
"""
Copied from SceneNet
"""
camera_intrinsics = numpy.zeros((3, 4))
camera_intrinsics[2, 2] = 1
camera_intrinsics[0, 0] = (pixel_width / 2.0) / math.tan(math.radians(hfov / 2.0))
camera_intrinsics[0, 2] = pixel_width / 2.0
camera_intrinsics[1, 1] = (pixel_height / 2.0) / math.tan(math.radians(vfov / 2.0))
camera_intrinsics[1, 2] = pixel_height / 2.0
return camera_intrinsics
def __getitem__(self, index):
scene_id = self.scenes[index]
scene_num, view_num = scene_id
frame1_path = self.dataroot / scene_num / f'photo/{view_num:04}.jpg'
frame2_path = self.dataroot / scene_num / f'photo/{view_num + 25:04}.jpg'
frame1 = self.get_image(frame1_path)
frame2 = self.get_image(frame2_path)
frame1_depth_path = self.dataroot / scene_num / f'depth/{view_num:04}.png'
frame2_depth_path = self.dataroot / scene_num / f'depth/{view_num + 25:04}.png'
frame1_depth = self.get_depth(frame1_depth_path)
frame2_depth = self.get_depth(frame2_depth_path)
trans = self.get_transformation(scene_num, view_num)
trans_inv = numpy.linalg.inv(trans)
identity = torch.eye(4)
intrinsic = self.camera_intrinsic_transform(pixel_height=frame1.shape[1], pixel_width=frame1.shape[2])
K = numpy.eye(4, dtype=numpy.float32)
K[:3, :4] = intrinsic
K_inv = numpy.linalg.inv(K)
return {'images': [frame1, frame2],
'depths': [frame1_depth, frame2_depth],
'cameras': [{'Pinv': identity, 'P': identity, 'K': K, 'Kinv': K_inv},
{'Pinv': trans_inv, 'P': trans, 'K': K, 'Kinv': K_inv}]
}
def __len__(self):
return len(self.scenes)
def toval(self, epoch):
pass
def totrain(self, epoch):
pass
I think it's probably something with the camera set up -- you should see when it first projects stuff that the noisy results somewhat align with the true images. You can try using the true depths in the code in order to see if the cameras are right (here: https://github.com/facebookresearch/synsin/blob/master/models/z_buffermodel.py#L89).
Thanks @oawiles. I'm using true depth only. I'll check if warping of features is correct.
You can also try warping the RGB -- e.g. pass the RGB colours as features. This should be easier to check. Then these should precisely match the other image.
@oawiles, you were right. The error is during warping only. The output of splatter is just an array of zeros. I believe the error is in the format of transformation, camera matrices and the depth map. Here are some of my findings
z_buffer_manipulator.py/PtsManipulator/project_pts()
) with mine and using positive depth values to platter, the splattered image looks good. It looks a little blurred and objects seem to be enlarged a bit, which I believe is due to splattering. Because of this, I'm not exactly sure if my code is correct. Hence, can you please tell me what changes I've to make to by transformation and other data so that it is in the format expected by SynSin?Thanks a lot
@oawiles, you were right. The error is during warping only. The output of splatter is just an array of zeros. The error is in the format of camera matrix. By writing my own transformation code, I'm able to train the SynSin model. But, I'm not able to get your transformation (warping) code to work correctly. I had the camera matrix in the form
With this camera matrix, splatter output was zeros. I changed the camera matrix and removed dependencies on height and width of frame as follows
With this, splatter output is a warped frame, but the transformation doesn't match with the ground truth. Can you suggest what changes I've to make to my camera matrix? In other words, in what format does your code camera matrix to be in?
Thanks a lot
What is the error? Sometimes how the splattered image looks in comparison to the true image makes it make snese. One thing I notice is that you should use K to make the values between -1,1 which I believe is not what you're doing. Another thing is sometimes you have to flip the Y. Without being able to see the visual results it's hard to guess at the precise problem.
Hi, I've attached the images below. This is the first frame (true)
This is the second frame (true)
This is the first frame warped to the view of second frame (splattered)
As you can notice, in the splattered image, the green beam has come down compared to true second frame.
My camera matrix is as below: where hfov=60 and vfov=45.
Also, I had to crop the images from 320x240 to 240x240. Would it make any difference?
It could make a difference. I would recommend you first try to resize. Otherwise I think the intrinsics would mess it up. It l ooks like it's zoomed in, which could be from the cropping. I'd recommend first resizing and then using a matrix to transform from the intrinsics to [-1,1] for x/y using an offset matrix O such that you have a new intrinsic matrix I = O K where K was your old intrinsic matrix.
OK. I'll try that. Thanks!
Hi I have similar issues as described in the first message of this thread. I'm trying to train the code on my own dataset. I do save out the warped images using gt depth with the 'use_rgb_features' option set to True and they do look good. However, the model doesn't really train and I continue to get images that are mostly a single color. I tried debugging with only using L1 loss etc. but I observe the same pattern. Do you have any other pointers to what could be the issue?
Hi, I'm trying to train the SynSin model on SceneNet database. But I'm not able to train the model. I would really appreciate it, if you can give me some tips.
--use_gt_depth
flag.D_Real
andD_Fake
have similar values in each batch (around 0.1 to 0.3). So, discriminator isn't training as well.I don't know what else to try. Can you kindly help me out here?