AugmentariumLab / omnisyn

OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas (VRW 2022)
https://augmentariumlab.github.io/omnisyn/
13 stars 2 forks source link

How to get perspective views from panoramas? #15

Closed EchoTHChen closed 1 year ago

EchoTHChen commented 1 year ago

Problem:

Sorry to bother you again! I want to split the panorama into perspective views and run perspective methods on them.

Although cube maps and panorama have been already given in this repo, I wonder how to split panorama into cube maps (for both images and poses). I have other panorama datasets in my hand, I also want to split them into perspective views.

Because you have very rich experience and knowledge of panorama, I would like to ask you how to turn panorama into perspective (such as cube maps)?

I tried to write the process of transforming habitat-matterport3d into perspective views based on the previous code, but I failed. How can I split panorama into cube maps (for both images and poses) based on habitat-matterport3d? Thanks!

Code:

1. The main script to split panorama into cube-maps. (python main.py)

import os
import cv2
import tqdm
import numpy as np
from PIL import Image
import my_e2p as E2P

class MY_ERP2PERS:
    def __init__(self, src_path, target_root):
        # cube
        # 18
        self.dataset_name = "m3d"
        self.fov = 90
        self.per_height = 255
        self.per_width = 255

        # FOV=90
        W = self.per_width
        H = self.per_height
        f = 0.5 * W * 1 / np.tan(0.5 * self.fov / 180.0 * np.pi)#

        cx = (W - 1) / 2.0
        cy = (H - 1) / 2.0
        K = np.array([
                [f, 0, cx],
                [0, f, cy],
                [0, 0,  1],
            ], np.float32)
        self.K = K
        print("Finish loading src.")
        self.target_root = target_root
        self._load_data(src_path)

        if not os.path.exists(target_root):
            os.mkdir(self.target_root)
        os.makedirs(os.path.join(self.target_root, 'images'), exist_ok=True)
        os.makedirs(os.path.join(self.target_root, 'depths'), exist_ok=True)
        os.makedirs(os.path.join(self.target_root, 'poses'), exist_ok=True)

    def _load_data(self, src_path):
        data = np.load(src_path)
        data_size = data["panos"].shape[1]
        print('data_size:', data_size)
        info_dict = dict()
        os.makedirs(self.target_root+"/"+"panos", exist_ok=True)

        for i in range(data_size):
            img_name = str(i)
            pano = np.uint8(data["panos"][0, i]*255)
            depth = data["depths"][0 ,i]
            rot = data["rots"][0, i]
            tran = data["trans"][0, i]
            # tran = -tran 
            info_dict[img_name] = {"pano":pano, "tran":tran, "rot": rot, "depth": depth}
            cv2.imwrite(self.target_root+"/panos/"+img_name+".jpg", pano[..., ::-1])
        self.info_dict = info_dict

    def get_perspective_one_img(self, img, depth, Rc, deal_with_img=True):#q_vec, origin_pos, deal_with_img=True):
        # Note: for simple, we build all context        
        equ = E2P.Equirectangular(self.dataset_name, img, depth, has_depth=True)
        rgb_img_list = []
        depth_list = []
        R_list = []

        #to cubemaps
        theta_phi_list = [ (0, 0), (-90, 0), (90, 0), (180, 0), (0, 90), (0, -90)]#FRONT, LEFT, RIGHT, BACK, TOP, DOWN

        for theta_phi in theta_phi_list:
            theta = theta_phi[0]
            phi = theta_phi[1]
            if deal_with_img:
                perspective_img, pers_depth = equ.GetPerspective(self.fov, theta, phi, self.per_height, self.per_width)
                rgb_img = perspective_img[:, :, ::-1] #rgb->bgr
                rgb_img_list.append(rgb_img)            
                #perspective_depth is spherical depth( converted to z-depth) 
                depth_list.append(self.dist_to_zdepth(pers_depth))
            # deal with Rotate
            y_axis = np.array([0.0, 1.0, 0.0], np.float32)
            x_axis = np.array([1.0, 0.0, 0.0], np.float32)
            R1, _ = cv2.Rodrigues(y_axis * np.radians(theta))
            R2, _ = cv2.Rodrigues(np.dot(R1, x_axis) * np.radians(phi))
            R = R2 @ R1
            #1.camera to world rotate, and change the camera coordinates(base coordinates changed) by using right product.
            Rnew = Rc @ R #2. Rnew = (R^T@Rc^T)^T,  R^T=R^(-1)

            R_list.append(Rnew.T)#w2c

        return rgb_img_list, depth_list, R_list

    #spherical depth to z-depth
    def dist_to_zdepth(self, perspective_depth):
        width_center = self.per_width / 2 - 0.5
        height_center = self.per_height / 2 - 0.5
        focal_len = (self.per_width / 2) / np.tan(self.fov/180 * np.pi/2)

        uv_int = np.stack(np.meshgrid(
            np.arange(self.per_width), np.arange(self.per_height)), axis=-1)
        diag_dist = np.sqrt((uv_int[:, :, 0] - width_center) ** 2 +
                        (uv_int[:, :, 1] - height_center) ** 2 + focal_len ** 2)

        z_depth = perspective_depth / diag_dist * focal_len
        return z_depth

    def process_all_images(self, deal_with_img=True):
        rgb_img_all_list = []
        # quat_all_list = []
        R_all_list = []
        depth_all_list = []
        t_all_list = []
        name_list = []
        for img_name in self.info_dict.keys():
            data_item = self.info_dict[img_name]
            img = data_item["pano"]
            depth = data_item["depth"]
            tran = data_item["tran"].reshape((3, 1))#w2c
            rot = data_item["rot"]#w2c

            #rot.T c2w
            rgb_img_list, depth_list, R_list = self.get_perspective_one_img(img, depth, rot.T, deal_with_img)
            #R_list:w2c

            rgb_img_all_list = rgb_img_all_list + rgb_img_list
            R_all_list = R_all_list + R_list
            depth_all_list = depth_all_list + depth_list
            for _ in range(len(R_list)):
                t_all_list.append(tran) #w2c
            name_list.append(img_name)

        return rgb_img_all_list, R_all_list, t_all_list, name_list, depth_all_list
    def prepare(self, deal_with_image=True):
        rgb_img_all_list, R_all_list, trans_all_list, name_list, depth_all_list = self.process_all_images(deal_with_img=deal_with_image)
        d_min = np.array(depth_all_list).min()
        d_max = np.array(depth_all_list).max()
        f_image = open(os.path.join(self.target_root, "images.txt"), 'w')
        # f_poses = open(os.path.join(self.target_root, "poses.txt"), 'w')
        # import ipdb;ipdb.set_trace()
        for index in range(len(name_list)):                        
            for item_index in range(6):                
                base_name = name_list[index]
                # item_index = #item_index_i*(self.vetical_num*2 + 1) + item_index_j
                new_name = self._update_name(base_name, item_index)
                new_id = index * 6 + item_index
                if deal_with_image:
                    rgb_img = rgb_img_all_list[new_id]
                    rgb_img = Image.fromarray(rgb_img[:, :, ::-1])
                    rgb_img.save(os.path.join(self.target_root, 'images', new_name))
                f_image.write(new_name + '\n')
                # poses_line = self._write_pose_line(R_all_list[new_id], trans_all_list[new_id])

                # rot_c = R_1.from_quat(quat_all_list[new_id]).as_matrix()
                R_w = R_all_list[new_id]

                tran_w = trans_all_list[new_id]
                w2c = np.concatenate([R_w, tran_w], axis=1)
                w2c = np.concatenate([w2c, np.array([[0, 0, 0, 1]])], axis=0)
                depth_curr = depth_all_list[new_id]
                cv2.imwrite(os.path.join(self.target_root, "depths", new_name), np.uint16(depth_curr*1000))
                d_norm = np.uint8((depth_curr-d_min)/(d_max-d_min)*255)
                d_color = cv2.applyColorMap(d_norm, cv2.COLORMAP_JET)
                cv2.imwrite(os.path.join(self.target_root, "depths", new_name[:-4]+".jpg"), d_color)
                np.savez(os.path.join(self.target_root, "depths", new_name[:-4]+".npz"), depth=depth_curr)
                np.savez(os.path.join(self.target_root, "poses", new_name[:-4]+".npz"), pose=w2c)

        np.savez(os.path.join(self.target_root, "K.npz"), K=self.K)
    def _update_name(self, name, p_id, img_format='.png'):
        new_name = name + '_pose{}'.format(str(p_id).zfill(3)) + img_format
        return new_name

if __name__ == '__main__':
    src_path="./test_data.npz"#data source
    target_path = "./save_pers" # data save-path
    os.makedirs(target_path, exist_ok=True)

    pipeline = MY_ERP2PERS(src_path=src_path, target_root=target_path)
    pipeline.prepare(deal_with_image=True)

2. The auxiliary function to get perspective view from panorama according to theta and phi. (my_e2p.py)

"""
transfer to perspective images

in:
poses(w2c)
rgbs
K

out:
perspective rgbs and poses
"""

import os
import sys
import cv2
import numpy as np

#cartesian to spherical

def xyz2lonlat(dataset_name, xyz):
    if dataset_name=="m3d":
        # import ipdb;ipdb.set_trace()
        norm = np.linalg.norm(xyz, axis=-1, keepdims=True)
        xyz_norm = xyz / norm
        x = xyz_norm[..., 0:1]
        y = xyz_norm[..., 1:2]
        z = xyz_norm[..., 2:]
        lon = np.arctan2(z, x) #[0, 2*pi]
        lat = np.arccos(y) #[0, pi]
    #     # import ipdb;ipdb.set_trace()
    else:
            # raise Exception# "dataset_name is not give!"
        atan2 = np.arctan2
        asin = np.arcsin
        norm = np.linalg.norm(xyz, axis=-1, keepdims=True)
        xyz_norm = xyz / norm
        x = xyz_norm[..., 0:1]
        y = xyz_norm[..., 1:2]
        z = xyz_norm[..., 2:]
        lon = atan2(x, z)
        lat = asin(y)
    lst = [lon, lat]
    # import ipdb;ipdb.set_trace()

    out = np.concatenate(lst, axis=-1)
    return out

##spherical to equirectangle
def lonlat2XY(dataset_name, lonlat, shape):

    if dataset_name=="m3d":
        # import ipdb;ipdb.set_trace()
        theta = lonlat[..., 0:1]
        phi = lonlat[..., 1:]
        # import ipdb;ipdb.set_trace()
        theta = (theta + 0.5*np.pi+2*np.pi) % (2*np.pi)
        # phi = (phi + np.pi) % np.pi
        X = theta / (2*np.pi) * (shape[1] - 1)
        Y = phi / (np.pi) * (shape[0] - 1)
        # import ipdb;ipdb.set_trace()

    else:
        X = (lonlat[..., 0:1] / (2 * np.pi) + 0.5) * (shape[1] - 1)
        Y = (lonlat[..., 1:] / (np.pi) + 0.5) * (shape[0] - 1)
    # import ipdb;ipdb.set_trace()
    assert (np.logical_and(X>=0, X<=shape[1]-1)).all(), "X out of range!"
    assert (np.logical_and(Y>=0, Y<=shape[0]-1)).all(), "Y out of range!"
    lst = [X, Y]

    out = np.concatenate(lst, axis=-1)
    return out 

class Equirectangular:
    def __init__(self, dataset_name, img, depth=None, has_depth=False):
        self.dataset_name = dataset_name
        self._img = img
        if has_depth:
            self._depth = depth
        # self._img = cv2.imread(img_name, cv2.IMREAD_COLOR)
        [self._height, self._width, _] = self._img.shape
        #cp = self._img.copy()  
        #w = self._width
        #self._img[:, :w/8, :] = cp[:, 7*w/8:, :]
        #self._img[:, w/8:, :] = cp[:, :7*w/8, :]

    def GetPerspective(self, FOV, THETA, PHI, height, width):
        #
        # THETA is left/right angle, PHI is up/down angle, both in degree
        #
        # form world to pixel
        f = 0.5 * width * 1 / np.tan(0.5 * FOV / 180.0 * np.pi)
        cx = (width - 1) / 2.0
        cy = (height - 1) / 2.0
        K = np.array([
                [f, 0, cx],
                [0, f, cy],
                [0, 0,  1],
            ], np.float32)
        K_inv = np.linalg.inv(K)

        x = np.arange(width)
        y = np.arange(height)

        x, y = np.meshgrid(x, y)
        z = np.ones_like(x)

        xyz = np.concatenate([x[..., None], y[..., None], z[..., None]], axis=-1)
        xyz = xyz @ K_inv.T
        y_axis = np.array([0.0, 1.0, 0.0], np.float32)
        x_axis = np.array([1.0, 0.0, 0.0], np.float32)

        R1, _ = cv2.Rodrigues(y_axis * np.radians(THETA))
        R2, _ = cv2.Rodrigues(np.dot(R1, x_axis) * np.radians(PHI))
        R = R2 @ R1
        xyz = xyz @ R.T

        lonlat = xyz2lonlat(self.dataset_name, xyz)        
        XY = lonlat2XY(self.dataset_name, lonlat, shape=self._img.shape).astype(np.float32)

        persp = cv2.remap(self._img, XY[..., 0], XY[..., 1], cv2.INTER_CUBIC, borderMode=cv2.BORDER_WRAP)#6 
        persp_depth = cv2.remap(self._depth, XY[..., 0], XY[..., 1], cv2.INTER_LINEAR, borderMode=cv2.BORDER_WRAP)#6

        return persp, persp_depth

3. visualization code to check the results of splitting and perspective projections.

import os 
import cv2
import numpy as np

data_dir="./save_pers"
img_names = os.listdir(data_dir+"/images")
current_img_name=data_dir+"/images/1_pose004.png"
current_rgb = cv2.imread(current_img_name, cv2.IMREAD_COLOR)
current_depth = np.load(data_dir+"/depths/1_pose004.npz")["depth"]
current_pose = np.load(data_dir+"/poses/1_pose004.npz")["pose"]#w2c
K = np.load(data_dir+"/K.npz")["K"]
print("K:",K)
H, W = current_rgb.shape[:2]
save_dir="./vis"
os.makedirs(save_dir, exist_ok=True)
cv2.imwrite(save_dir+"/src_orig.jpg", current_rgb)

#get keypoints in source views
gray = cv2.cvtColor(current_rgb, cv2.COLOR_BGR2GRAY) 
sift = cv2.xfeatures2d.SIFT_create(nfeatures=200, contrastThreshold=0.02, edgeThreshold=20)
kp = sift.detect(gray, None)

print("len(kp):", len(kp))
keypoints = [[int(kp[idx].pt[0]), int(kp[idx].pt[1])] for idx in range(len(kp))]
keypoints = np.array(keypoints)

z_depth = current_depth[keypoints[: , 1], keypoints[:, 0]]

valid_list = np.where(z_depth>0)[0]
z_depth = z_depth[valid_list]#n, 1
keypoints = keypoints[valid_list] #n ,2

#write keypoints in src view.
for i in range(len(keypoints)):
    coord_x, coord_y = keypoints[i, 0], keypoints[i, 1]
    cv2.circle(current_rgb, (int(coord_x), int(coord_y)), color = (0, 0, 255), thickness = 1, radius=1)
cv2.imwrite(save_dir+"/src.jpg", current_rgb)

img_names = os.listdir(data_dir+"/images")
num_views = len(img_names)

# keypoints:Nx2
N, _ = keypoints.shape
points_homo = np.concatenate([keypoints, np.ones((N, 1))], axis=1)#N, 3
points_cam_norm = np.linalg.inv(K)@points_homo.T #3x3, 3xN->3xN

points_cam = points_cam_norm*(z_depth.reshape(1, N)).repeat([3], axis=0)#3xN, 1,N->3xN
pose_name = "w2c"
if pose_name=="w2c":
    points_w = np.linalg.inv(current_pose) @ np.concatenate([points_cam, np.ones((1, points_cam.shape[1]))], axis=0)#4,N
else:
    points_w = current_pose@ np.concatenate([points_cam, np.ones((1, points_cam.shape[1]))], axis=0)#4,N

for view_i in range(num_views):
    print("view_i:", view_i)
    img_name = img_names[view_i]
    rgb_other = cv2.imread(data_dir+"/images/"+img_name, cv2.IMREAD_COLOR)
    pose_name = img_names[view_i][:-4]+".npz"
    pose_other = np.load(data_dir+"/poses/"+pose_name)["pose"]#w2c

    if pose_name == "w2c":
        points_c = pose_other @ points_w 
    else:
        points_c = np.linalg.inv(pose_other) @ points_w #4,N

    points_coords_h = K @ points_c[:3, :]#()@(3,N)->3,N (0, 0, 1)
    valid_list = np.where(points_c[2, :]>0)[0]
    points_coords_h = points_coords_h[:, valid_list]

    points_coords = (points_coords_h/points_coords_h[2:, :]).T #->N,2

    cv2.imwrite(save_dir+"/other_"+img_name[:-4]+"_orig.jpg", rgb_other)

    for coord_i in range(len(points_coords)):
        coord_x, coord_y = points_coords[coord_i, 0], points_coords[coord_i, 1]
        if coord_x >=0 and coord_x < W and coord_y >=0 and coord_y < H:
            cv2.circle(rgb_other, (int(coord_x), int(coord_y)), color = (0, 0, 255), thickness = 1, radius=1)
    cv2.imwrite(save_dir+"/other_"+img_name[:-4]+".jpg", rgb_other)

Previous Code(Basic):

1. create_rgb_dataset.py

# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
# Taken from https://github.com/facebookresearch/splitnet
import gzip
import os
from typing import Dict
import habitat
import habitat.datasets.pointnav.pointnav_dataset as mp3d_dataset
import numpy as np
import quaternion
import torch
import torchvision.transforms as transforms
import tqdm
from habitat.config.default import get_config
from habitat.datasets import make_dataset
from scipy.spatial.transform.rotation import Rotation

from data_readers.mhabitat import vector_env
from geometry.camera_transformations import get_camera_matrices
from helpers import my_helpers
from mutils.jitter import jitter_quaternions

def _load_datasets(config_keys, dataset, data_path, scenes_path, num_workers):
  # For each scene, create a new dataset which is added with the config
  # to the vector environment.

  print(len(dataset.episodes))
  datasets = []
  configs = []

  num_episodes_per_worker = len(dataset.episodes) / float(num_workers)

  for i in range(0, min(len(dataset.episodes), num_workers)):
    config = make_config(*config_keys)
    config.defrost()

    dataset_new = mp3d_dataset.PointNavDatasetV1()
    with gzip.open(data_path, "rt") as f:
      dataset_new.from_json(f.read())
      dataset_new.episodes = dataset_new.episodes[
                             int(i * num_episodes_per_worker): int(
                               (i + 1) * num_episodes_per_worker
                             )
                             ]

      for episode_id in range(0, len(dataset_new.episodes)):
        dataset_new.episodes[episode_id].scene_id = \
          dataset_new.episodes[episode_id].scene_id.replace(
            '/checkpoint/erikwijmans/data/mp3d/',
            scenes_path)

    config.SIMULATOR.SCENE = str(dataset_new.episodes[0].scene_id)
    config.freeze()

    datasets += [dataset_new]
    configs += [config]
  return configs, datasets

def make_config(
    config, gpu_id, split, data_path, sensors, resolution, scenes_dir
):
  config = get_config(config)
  config.defrost()
  config.TASK.NAME = "Nav-v0"
  config.TASK.MEASUREMENTS = []
  config.DATASET.SPLIT = split
  # config.DATASET.POINTNAVV1.DATA_PATH = data_path
  config.DATASET.DATA_PATH = data_path
  config.DATASET.SCENES_DIR = scenes_dir
  config.HEIGHT = resolution
  config.WIDTH = resolution
  for sensor in sensors:
    config.SIMULATOR[sensor]["HEIGHT"] = resolution
    config.SIMULATOR[sensor]["WIDTH"] = resolution
    config.SIMULATOR[sensor]["POSITION"] = np.array([0, 0, 0])

  config.TASK.HEIGHT = resolution
  config.TASK.WIDTH = resolution
  config.SIMULATOR.TURN_ANGLE = 15
  config.SIMULATOR.FORWARD_STEP_SIZE = 0.1  # in metres
  config.SIMULATOR.AGENT_0.SENSORS = sensors
  config.SIMULATOR.DEPTH_SENSOR.NORMALIZE_DEPTH = False

  # config.SIMULATOR.DEPTH_SENSOR.HFOV = 90
  config.SIMULATOR.DEPTH_SENSOR.HFOV = 90
  config.SIMULATOR.RGB_SENSOR.HFOV = 90

  config.ENVIRONMENT.MAX_EPISODE_STEPS = 2 ** 32
  config.SIMULATOR.HABITAT_SIM_V0.GPU_DEVICE_ID = gpu_id
  return config

class RandomImageGenerator(object):
  def __init__(self, split, gpu_id, opts, vectorize=False, seed=0,
               full_width=512, full_height=256,
               reference_idx=1, m3d_dist=1.0,
               use_rand=True) -> None:
    self.vectorize = vectorize

    print("gpu_id", gpu_id)
    resolution = opts.W
    self.resolution = resolution
    # import ipdb;ipdb.set_trace()
    if opts.use_semantics:
      sensors = ["RGB_SENSOR", "DEPTH_SENSOR", "SEMANTIC_SENSOR"]
    else:
      sensors = ["RGB_SENSOR", "DEPTH_SENSOR"]
    if split == "train":
      data_path = opts.train_data_path
    elif split == "val":
      data_path = opts.val_data_path
    elif split == "test":
      data_path = opts.test_data_path
    else:
      raise Exception("Invalid split")

    unique_dataset_name = opts.dataset

    self.num_parallel_envs = 2
    self.use_rand = use_rand

    self.images_before_reset = opts.images_before_reset
    config = make_config(
      opts.config,
      gpu_id,
      split,
      data_path,
      sensors,
      resolution,
      opts.scenes_dir,
    )
    self.config=config

    data_dir = os.path.join(
      "data_readers/scene_episodes/", unique_dataset_name + "_" + split
    )
    self.dataset_name = config.DATASET.TYPE
    print(data_dir)
    if not os.path.exists(data_dir):
      os.makedirs(data_dir)
    data_path = os.path.join(data_dir, "dataset_one_ep_per_scene.json.gz")
    # Creates a dataset where each episode is a random spawn point in each scene.
    print("One ep per scene", flush=True)
    if not (os.path.exists(data_path)):
      print("Creating dataset...", flush=True)
      dataset = make_dataset(config.DATASET.TYPE, config=config.DATASET)
      # Get one episode per scene in dataset
      scene_episodes = {}
      for episode in tqdm.tqdm(dataset.episodes):
        if episode.scene_id not in scene_episodes:
          scene_episodes[episode.scene_id] = episode

      scene_episodes = list(scene_episodes.values())
      dataset.episodes = scene_episodes
      if not os.path.exists(data_path):
        # Multiproc do check again before write.
        json = dataset.to_json().encode("utf-8")
        with gzip.GzipFile(data_path, "w") as fout:
          fout.write(json)
      print("Finished dataset...", flush=True)

    # Load in data and update the location to the proper location (else
    # get a weird, uninformative, error -- Affine2Dtransform())
    dataset = mp3d_dataset.PointNavDatasetV1()
    with gzip.open(data_path, "rt") as f:
      dataset.from_json(f.read())

      for i in range(0, len(dataset.episodes)):
        dataset.episodes[i].scene_id = dataset.episodes[i].scene_id.replace(
          '/checkpoint/erikwijmans/data/mp3d/',
          opts.scenes_dir + '/mp3d/')

    config.TASK.SENSORS = ["POINTGOAL_SENSOR"]

    config.freeze()

    self.rng = np.random.RandomState(seed)
    self.reference_idx = reference_idx

    # Now look at vector environments
    if self.vectorize:
      configs, datasets = _load_datasets(
        (
          opts.config,
          gpu_id,
          split,
          data_path,
          sensors,
          resolution,
          opts.scenes_dir,
        ),
        dataset,
        data_path,
        opts.scenes_dir + '/mp3d/',
        num_workers=self.num_parallel_envs,
      )
      num_envs = len(configs)
      env_fn_args = tuple(zip(configs, datasets, range(num_envs)))
      envs = vector_env.VectorEnv(
        env_fn_args=env_fn_args,
        multiprocessing_start_method="forkserver",
      )

      self.env = envs
      self.num_train_envs = int(0.9 * (self.num_parallel_envs))
      self.num_val_envs = self.num_parallel_envs - self.num_train_envs
    else:
      self.env = habitat.Env(config=config, dataset=dataset)
      self.env_sim = self.env.sim
      self.rng.shuffle(self.env.episodes)
      self.env_sim = self.env.sim

    self.num_samples = 0

    # Set up intrinsic parameters
    self.hfov = config.SIMULATOR.DEPTH_SENSOR.HFOV * np.pi / 180.0
    self.W = resolution
    self.K = np.array(
      [
        [1.0 / np.tan(self.hfov / 2.0), 0.0, 0.0, 0.0],
        [0, 1.0 / np.tan(self.hfov / 2.0), 0.0, 0.0],
        [0.0, 0.0, 1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0],
      ],
      dtype=np.float32,
    )

    self.invK = np.linalg.inv(self.K)

    self.config = config
    self.opts = opts

    if self.opts.normalize_image:
      self.transform = transforms.Compose(
        [
          transforms.ToTensor(),
          transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
        ]
      )  # Using same normalization as BigGan
    else:
      self.transform = transforms.ToTensor()

    width = full_width #
    height = full_height
    theta, phi = np.meshgrid((np.arange(width) + 0.5) * (2 * np.pi / width),
                             (np.arange(height) + 0.5) * (np.pi / height))
    uvs, uv_sides = my_helpers.spherical_to_cubemap(theta.reshape(-1),
                                                    phi.reshape(-1))
    self.width = width
    self.height = height
    self.uvs = uvs.reshape(height, width, 2)
    self.uv_sides = uv_sides.reshape(height, width)
    self.depth_to_dist_cache = {}
    self.m3d_dist = m3d_dist

  def get_vector_sample(self, index, num_views, isTrain=True):
    if self.num_samples % self.images_before_reset == 0:
      self.env.reset()

    # Randomly choose an index of given environments
    if isTrain:
      index = index % self.num_train_envs
    else:
      index = (index % self.num_val_envs) + self.num_train_envs

    depths = []
    rgbs = []

    orig_location = np.array(self.env.sample_navigable_point(index))
    rand_angle = 0
    if not self.use_rand:
      rand_angle = self.rng.uniform(0, 2 * np.pi)

    orig_rotation = [0, np.sin(rand_angle / 2), 0, np.cos(rand_angle / 2)]#[0, 0, 0, 1]
    # obs = self.env.get_observations_at(
    #   index, position=orig_location, rotation=orig_rotation
    # )
    translations = []
    rotations = []
    #added
    depth_cubes = []
    rgb_cubes = []
    rots_cubes = []
    trans_cubes = []

    for i in range(0, num_views):
      rand_location = orig_location.copy()
      rand_rotation = orig_rotation.copy()
      if self.opts.image_type == "translation_z":
        movement_deltas = {
          0: self.m3d_dist,
          1: 0.0,
          2: -self.m3d_dist
        }
        rand_location[[2]] = (
            orig_location[[2]] + movement_deltas[i]
        )
      else:
        raise ValueError("Unknown image type")

      cubemap_rotations = [
        Rotation.from_euler('x', 90, degrees=True),  # Top
        Rotation.from_euler('y', 0, degrees=True),
        Rotation.from_euler('y', -90, degrees=True),
        Rotation.from_euler('y', -180, degrees=True),
        Rotation.from_euler('y', -270, degrees=True),
        Rotation.from_euler('x', -90, degrees=True)  # Bottom
      ]
      rgb_cubemap_sides = []
      depth_cubemap_sides = []
      rotations_cubemap_sides=[]#q_vec
      locations_cubemap_sides=[]#t_vec

      rand_location = rand_location + np.array([0, 1.25, 0]) #
      rand_rotation = Rotation.from_quat(rand_rotation) #
      for j in range(6):
        my_rotation = (rand_rotation * cubemap_rotations[j]).as_quat()
        obs = self.env.get_observations_at(
          index,
          position=rand_location,
          rotation=my_rotation.tolist()
        )
        normalized_rgb = obs["rgb"].astype(np.float32) / 255.0
        rgb_cubemap_sides.append(normalized_rgb)
        depth_cubemap_sides.append(obs["depth"])
        locations_cubemap_sides.append(rand_location)
        rotations_cubemap_sides.append((Rotation.from_quat(my_rotation)).as_matrix())

      locations_cubemap_sides = np.stack(locations_cubemap_sides, axis=0)
      rotations_cubemap_sides = np.stack(rotations_cubemap_sides, axis=0)      

      rgb_cubemap_sides = np.stack(rgb_cubemap_sides, axis=0)
      rgb_erp_image = self.stitch_cubemap(rgb_cubemap_sides, clip=True)

      depth_cubemap_sides = np.stack(depth_cubemap_sides, axis=0)

      depth_erp_image = self.stitch_cubemap(depth_cubemap_sides, clip=False)
      depths += [depth_erp_image]
      rgbs += [rgb_erp_image]

      rotations.append(rand_rotation.as_matrix())#
      translations.append(rand_location)#

      depth_cubes.append(depth_cubemap_sides)
      rgb_cubes.append(rgb_cubemap_sides)
      trans_cubes.append(locations_cubemap_sides)
      rots_cubes.append(rotations_cubemap_sides)

    trans_cubes = np.stack(trans_cubes, axis=0).astype(np.float32)
    rots_cubes = np.stack(rots_cubes, axis=0).astype(np.float32)
    depth_cubes = np.stack(depth_cubes, axis=0)#.astype(np.float32)
    rgb_cubes = np.stack(rgb_cubes, axis=0)#.astype(np.float32)

    translations = np.stack(translations, axis=0).astype(np.float32)
    rotations = np.stack(rotations, axis=0).astype(np.float32)

    reference_idx = self.reference_idx

    # rotation_offset = Rotation.from_euler('x', np.pi, degrees=False).as_matrix()

    for i in range(translations.shape[0]):
      for j in range(6):
        trans_cubes[i, j] = np.linalg.inv(rotations[reference_idx]) @ (
          trans_cubes[i,j] - translations[reference_idx]
        )
        # if j>=1 and j<5:
        #   # Rotate cameras 180 degrees    
        #   rots_cubes[i, j] = rots_cubes[i, j]@rotation_off

        # trans_cubes[i, j] = -rots_cubes[i, j]@trans_cubes[i,j]
    # rots_cubes[ :, 1:5] = np.einsum("...ij,jk->...ik", rots_cubes[ :, 1:5], rotation_offset)

    # Rotate cameras 180 degrees
    rotation_offset = Rotation.from_euler('x', np.pi, degrees=False).as_matrix()
    rots_cubes[ :, 1:5] = np.einsum("...ij,jk->...ik", rots_cubes[:, 1:5], rotation_offset)
    # points_cam_other = rots_other@(points_w[:3, :]-trans_other)
    # import ipdb;ipdb.set_trace()
    trans_cubes = trans_cubes[..., np.newaxis]
    trans_cubes = np.einsum("...ij,...jk->...ik", rots_cubes, -trans_cubes)

    # rots_cubes[i,j] = rotations[reference_idx] @ np.linalg.inv(rots_cubes[i, j])#
    # rots_other@(points_w[:3, :]-trans_other)

    for i in range(translations.shape[0]):
      if i != reference_idx:
        translations[i] = -np.linalg.inv(rotations[reference_idx]) @ (
            translations[i] - translations[reference_idx])
        rotations[i] = rotations[reference_idx] @ np.linalg.inv(rotations[i])#
    translations[reference_idx] = 0.0 * translations[reference_idx]
    rotations[reference_idx] = np.eye(3)

    self.num_samples += 1

    rgbs = np.stack(rgbs, axis=0)
    depths = np.stack(depths, axis=0)
    depths = depths[..., 0:1]
    depths = self.zdepth_to_distance(depths)
    #revised
    sample = {
      "rgb_panos": rgbs[:, :, :, :3],
      "rots": rotations,
      "trans": translations,
      "depth_panos": depths[:, :, :, 0],
      "rgb_cubes": rgb_cubes,
      "depth_cubes": depth_cubes,
      "rots_cubes": rots_cubes,
      "trans_cubes": trans_cubes,
    }
    return sample

  def get_singleenv_sample(self, num_views) -> Dict[str, np.ndarray]:

    if self.num_samples % self.images_before_reset == 0:
      old_env = self.env._current_episode_index
      self.env.reset()
      print(
        "RESETTING %d to %d \n"
        % (old_env, self.env._current_episode_index),
        flush=True,
      )

    depths = []
    rgbs = []
    cameras = []
    semantics = []

    rand_location = self.env_sim.sample_navigable_point()
    if self.opts.image_type == "fixedRT_baseline":
      rand_angle = self.angle_rng.uniform(0, 2 * np.pi)
    else:
      rand_angle = self.rng.uniform(0, 2 * np.pi)
    rand_rotation = [0, np.sin(rand_angle / 2), 0, np.cos(rand_angle / 2)]
    obs = self.env_sim.get_observations_at(
      position=rand_location,
      rotation=rand_rotation,
      keep_agent_at_new_pose=True,
    )

    for i in range(0, num_views):
      position = rand_location.copy()
      rotation = rand_rotation.copy()
      if self.opts.image_type == "translation":
        position[0] = position[0] + self.rng.rand() * 0.2 - 0.1
      elif self.opts.image_type == "outpaint":
        rotation = quaternion.as_float_array(
          jitter_quaternions(
            quaternion.from_float_array(rand_rotation),
            self.rng,
            angle=10,
          )
        ).tolist()
      elif self.opts.image_type == "fixedRT_baseline":
        rand_location = self.rand_location
        rotation = self.rand_rotation

      else:
        position[0] = position[0] + self.rng.rand() * 0.3 - 0.15
        rotation = quaternion.as_float_array(
          jitter_quaternions(
            quaternion.from_float_array(rand_rotation),
            self.rng,
            angle=10,
          )
        ).tolist()

      obs = self.env_sim.get_observations_at(
        position=position,
        rotation=rotation,
        keep_agent_at_new_pose=True,
      )

      depths += [torch.Tensor(obs["depth"][..., 0]).unsqueeze(0)]
      rgbs += [self.transform(obs["rgb"].astype(np.float32) / 255.0)]

      if "semantic" in obs.keys():
        instance_semantic = torch.Tensor(
          obs["semantic"].astype(np.int32)
        ).unsqueeze(0)
        class_semantic = torch.zeros(instance_semantic.size()).long()

        id_to_label = {
          int(obj.id.split("_")[-1]): obj.category.index()
          for obj in self.env.sim.semantic_annotations().objects
        }

        for id_scene in np.unique(instance_semantic.numpy()):
          class_semantic[instance_semantic == id_scene] = id_to_label[
            id_scene
          ]

        semantics += [class_semantic]

      agent_state = self.env_sim.get_agent_state().sensor_states["depth"]
      rotation = quaternion.as_rotation_matrix(agent_state.rotation)
      position = agent_state.position
      P, Pinv = get_camera_matrices(position=position, rotation=rotation)
      cameras += [{"P": P, "Pinv": Pinv, "K": self.K, "Kinv": self.invK}]

    self.num_samples += 1
    if len(semantics) > 0:
      return {
        "images": rgbs,
        "depths": depths,
        "cameras": cameras,
        "semantics": semantics,
      }

    return {"images": rgbs, "depths": depths, "cameras": cameras}

  def get_sample(self, index, num_views, isTrain):
    if self.vectorize:
      return self.get_vector_sample(index, num_views, isTrain)
    else:
      return self.get_singleenv_sample(num_views)

  def stitch_cubemap(self, cubemap, clip=True):
    """Stitches a single cubemap into an equirectangular image.
    Args:
      cubemap: Cubemap images as 6xHxWx3 arrays.
      clip: Clip values to [0, 1].
    Returns:
      Single equirectangular image as HxWx3 image.
    """
    cube_height, cube_width = cubemap.shape[1:3]

    uvs = self.uvs
    uv_sides = self.uv_sides
    height = self.height
    width = self.width

    skybox_uvs = np.stack(
      (uvs[:, :, 0] * (cube_width - 1), uvs[:, :, 1] * (cube_height - 1)),
      axis=-1)
    final_image = np.zeros((height, width, 3), dtype=np.float32)
    for i in range(0, 6):
      # Grabs a transformed side of the cubemap.
      my_side_indices = np.equal(uv_sides, i)
      final_image[my_side_indices] = my_helpers.bilinear_interpolate(
        cubemap[i, :, :, :], skybox_uvs[my_side_indices, 0],
        skybox_uvs[my_side_indices, 1])
    if clip:
      final_image = np.clip(final_image, 0, 1)
    return final_image

  def zdepth_to_distance(self, depth_image):
    """Converts a depth (z-depth) image to a euclidean distance image.

    Args:
      depth_image: Equirectangular depth image as BxHxWx1 array.

    Returns: Equirectangular distance image.

    """
    batch_size, height, width, channels = depth_image.shape
    cache_key = "_".join((str(height), str(width)))
    self.cache_depth_to_dist(height, width)
    ratio = self.depth_to_dist_cache[cache_key]
    new_depth_image = depth_image * ratio[np.newaxis, :, :, np.newaxis]
    return new_depth_image

  def cache_depth_to_dist(self, height, width):
    """Caches a depth to dist ratio"""
    cache_key = "_".join((str(height), str(width)))
    if cache_key not in self.depth_to_dist_cache:
      cubemap_height = self.resolution #128 #256
      cubemap_width = self.resolution #128 #256
      # Distance to image plane
      theta, phi = np.meshgrid(
        (np.arange(width) + 0.5) * (2 * np.pi / width),
        (np.arange(height) + 0.5) * (np.pi / height))
      uvs, uv_sides = my_helpers.spherical_to_cubemap(theta.reshape(-1),
                                                      phi.reshape(-1))
      cubemap_uvs = uvs.reshape(height, width, 2)
      uv_int = np.stack(
        (cubemap_uvs[:, :, 0] * (cubemap_width - 1),
         cubemap_uvs[:, :, 1] *
         (cubemap_height - 1)),
        axis=-1)

      width_center = cubemap_width / 2 - 0.5
      height_center = cubemap_height / 2 - 0.5
      focal_len = (cubemap_height / 2) / np.tan(self.config.SIMULATOR.DEPTH_SENSOR.HFOV * np.pi / 180.0/2)

      diag_dist = np.sqrt((uv_int[:, :, 0] - width_center) ** 2 +
                          (uv_int[:, :,
                           1] - height_center) ** 2 + focal_len ** 2)
      self.depth_to_dist_cache[cache_key] = diag_dist / focal_len

2. generate test_data.npz

from torch.utils.data import DataLoader
from data_readers.habitat_data import HabitatImageGenerator

def load_data(mode, full_width, full_height, m3d_dist, seq_len=3, reference_idx=1):
    # args.dataset_name == "m3d":
    train_data = HabitatImageGenerator(
        split=mode,
        full_width=full_width,
        full_height=full_height,
        m3d_dist=m3d_dist,
        seq_len=seq_len,
        reference_idx=reference_idx,
    )

    train_dataloader = DataLoader(
        dataset=train_data,
        num_workers=0,
        batch_size=1,
        shuffle=False,
        drop_last=True,
        pin_memory=True,
    )
    return train_data, train_dataloader

# mode='train'
mode='test'
train_data, train_loader = load_data(mode, 512, 256, 0.5, seq_len=3, reference_idx=1)

def vis_data():
    loader = train_loader
    data_idx = 0

    # data = train_data.__getitem__(data_idx)
    for i, data in enumerate(loader):
        print("i:", i)
        if i>= data_idx:
            panos = data["rgb_panos"]#.to(args.device)
            depths = data["depth_panos"]#.to(args.device)
            rots = data["rots"]#.to(args.device)
            trans = data["trans"]#.to(args.device)
            rgb_cubes = data["rgb_cubes"]
            depth_cubes = data["depth_cubes"]
            trans_cubes = data["trans_cubes"]
            rots_cubes = data["rots_cubes"]
            import ipdb;ipdb.set_trace()

            import numpy as np
            np.savez("./test_data.npz", panos=panos, depths=depths, rots=rots, trans=trans, rgb_cubes=rgb_cubes, \
                depth_cubes=depth_cubes, trans_cubes = trans_cubes, rots_cubes=rots_cubes)
            import cv2

            # import ipdb;ipdb.set_trace()

            out = np.zeros_like(panos[0, 0])
            cv2.normalize(panos[0, 0].data.cpu().numpy()*255, out, 0, 255, cv2.NORM_MINMAX)

            rgb_out = np.array(out,dtype='uint8')
            cv2.imwrite("pano_np_"+str(data_idx)+".jpg", rgb_out)

            break;

vis_data()

3. panorama projection

def np_cartesian_to_spherical(xyz, eps=None, linearize_angle=np.deg2rad(10)):
    x = xyz[:, 0]
    y = xyz[:, 1]
    z = xyz[:, 2]
    theta = np.arctan2(z, x)
    radius = np.linalg.norm(xyz, axis=-1)
    y_over_r = y / radius
    phi = np.arccos(y_over_r) #-
    return theta, phi, radius

def np_spherical_to_cartesian(theta, phi, r):
    tmp = r * np.sin(phi)
    x = tmp * np.cos(theta)
    y = r * np.cos(phi)
    z = tmp * np.sin(theta)
    return np.stack((x, y, z), axis=-1)

#equi to spherical
#theta = (x / W) * 2 * np.pi
#theta = theta - 0.5*np.pi#
#phi = (y / H) * np.pi

#spherical to equi:
#theta = (theta + 0.5*np.pi) % (2*np.pi)
#x = theta / (2*np.pi) * W
#y = phi / (np.pi) * H
dli7319 commented 1 year ago

Your code seems to work for me. Only the cubemaps are upside down because the y-axis in image pixels space is upside down compared to the up axis in world coordinates.

Just update def GetPerspective(self, FOV, THETA, PHI, height, width):

        x = np.arange(width)
        y = np.arange(height - 1, -1, -1)
EchoTHChen commented 1 year ago

When I try to test the perspective projections using the visualization code, the re-projections results are wrong. I guess the poses are wrong, but I do not find out the problem. Source view: src (3)

Target views: other_0_pose005 other_2_pose005

The visualization code:

import os 
import cv2
import numpy as np

data_dir="./save_pers"
img_names = os.listdir(data_dir+"/images")
current_img_name=data_dir+"/images/1_pose004.png"
current_rgb = cv2.imread(current_img_name, cv2.IMREAD_COLOR)
current_depth = np.load(data_dir+"/depths/1_pose004.npz")["depth"]
current_pose = np.load(data_dir+"/poses/1_pose004.npz")["pose"]#w2c
K = np.load(data_dir+"/K.npz")["K"]
print("K:",K)
H, W = current_rgb.shape[:2]
save_dir="./vis"
os.makedirs(save_dir, exist_ok=True)
cv2.imwrite(save_dir+"/src_orig.jpg", current_rgb)

#get keypoints in source views
gray = cv2.cvtColor(current_rgb, cv2.COLOR_BGR2GRAY) 
sift = cv2.xfeatures2d.SIFT_create(nfeatures=200, contrastThreshold=0.02, edgeThreshold=20)
kp = sift.detect(gray, None)

print("len(kp):", len(kp))
keypoints = [[int(kp[idx].pt[0]), int(kp[idx].pt[1])] for idx in range(len(kp))]
keypoints = np.array(keypoints)

z_depth = current_depth[keypoints[: , 1], keypoints[:, 0]]

valid_list = np.where(z_depth>0)[0]
z_depth = z_depth[valid_list]#n, 1
keypoints = keypoints[valid_list] #n ,2

#write keypoints in src view.
for i in range(len(keypoints)):
    coord_x, coord_y = keypoints[i, 0], keypoints[i, 1]
    cv2.circle(current_rgb, (int(coord_x), int(coord_y)), color = (0, 0, 255), thickness = 1, radius=1)
cv2.imwrite(save_dir+"/src.jpg", current_rgb)

img_names = os.listdir(data_dir+"/images")
num_views = len(img_names)

# keypoints:Nx2
N, _ = keypoints.shape
points_homo = np.concatenate([keypoints, np.ones((N, 1))], axis=1)#N, 3
points_cam_norm = np.linalg.inv(K)@points_homo.T #3x3, 3xN->3xN

points_cam = points_cam_norm*(z_depth.reshape(1, N)).repeat([3], axis=0)#3xN, 1,N->3xN
pose_name = "w2c"
if pose_name=="w2c":
    points_w = np.linalg.inv(current_pose) @ np.concatenate([points_cam, np.ones((1, points_cam.shape[1]))], axis=0)#4,N
else:
    points_w = current_pose@ np.concatenate([points_cam, np.ones((1, points_cam.shape[1]))], axis=0)#4,N

for view_i in range(num_views):
    print("view_i:", view_i)
    img_name = img_names[view_i]
    rgb_other = cv2.imread(data_dir+"/images/"+img_name, cv2.IMREAD_COLOR)
    pose_name = img_names[view_i][:-4]+".npz"
    pose_other = np.load(data_dir+"/poses/"+pose_name)["pose"]#w2c

    if pose_name == "w2c":
        points_c = pose_other @ points_w 
    else:
        points_c = np.linalg.inv(pose_other) @ points_w #4,N

    points_coords_h = K @ points_c[:3, :]#()@(3,N)->3,N (0, 0, 1)
    valid_list = np.where(points_c[2, :]>0)[0]
    points_coords_h = points_coords_h[:, valid_list]

    points_coords = (points_coords_h/points_coords_h[2:, :]).T #->N,2

    cv2.imwrite(save_dir+"/other_"+img_name[:-4]+"_orig.jpg", rgb_other)

    for coord_i in range(len(points_coords)):
        coord_x, coord_y = points_coords[coord_i, 0], points_coords[coord_i, 1]
        if coord_x >=0 and coord_x < W and coord_y >=0 and coord_y < H:
            cv2.circle(rgb_other, (int(coord_x), int(coord_y)), color = (0, 0, 255), thickness = 1, radius=1)
    cv2.imwrite(save_dir+"/other_"+img_name[:-4]+".jpg", rgb_other)
ruofeidu commented 1 year ago

Hi EchoTHChen,

Can we also have your name in the GitHub or via email privately as well?

I think the questions are more related to helping / debugging rather than questions for our paper / repo now. We may help but probably we should discuss it via email thread rather than GitHub (and on GitHub you are anonymous here).

Thank you so much and I want to make sure David is making the best use of his research time :)

Cheers, Ruofei

EchoTHChen commented 1 year ago

@ruofeidu Yes, you are right. Sorry for occupying you too much time. My name is Zheng Chen from Tsinghua University. My email is chenz20@mails.tsinghua.edu.cn.

I use github instead of email because there are many codes in my descriptions. it is more convenient to use to the markdown language in github than emal.

I will use email to contact David later privately.

ruofeidu commented 1 year ago

Happy to chat with you more over emails. Feel free to also cc me at me [at] duruofei [dot] com :)

dli7319 commented 1 year ago

I think I fixed the bug. Try this: https://gist.github.com/dli7319/0198e52a443001a9413878b2a9a2dfd0

other_0_pose004 other_1_pose004 other_2_pose004

EchoTHChen commented 1 year ago

Thanks for your nice reply and the detailed code. It works fine for perspective projection.