facebookresearch / projectaria_tools

projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
Apache License 2.0
382 stars 47 forks source link

[ASE] The colors of the edge pixels are darker than the center pixels. How can I rectify this problem? #99

Open thucz opened 1 month ago

thucz commented 1 month ago

Your dataset is really good! I'm trying to use ASE as my training data for the novel view synthesis task. But I found a problem: In the circle, the colors of the edge pixels (I'm not saying about the black border) are much darker than the center pixels(I have undistorted the images). So the corresponding pixels in the edges are not fully view-consistent between neighbor views. Do you know how to rectify this inconsistent brightness problem?

0000026

0000034

thucz commented 1 month ago

This problem makes it difficult to produce a good result for the Novel View Synthesis task on this dataset. In the rightmost column, the existing Gaussian Splatting methods may easily produce strange dark borders on this dataset.

4774

captain-sysadmin commented 1 month ago

Hello!

ASE is designed to produce accurate simulations of Aria output. The RGB camera has a very small fisheye lens on it. This means that we also simulate the Vignette of the Cameras. As you might be aware, most fisheye lenses produce a pronounced variation of brightness, more information can be found Here: (non affiliated link!)

The good news is that the variance in brightness is static. It could be reduced by creating a gradient (by inverting an image similar to this and multiplying it with the ASE image.

I hope that helps!

thucz commented 1 month ago

Thanks for your reply! However, I still do not know how to compute this gradient in ASE data accurately.

thucz commented 1 month ago

Hello!

ASE is designed to produce accurate simulations of Aria output. The RGB camera has a very small fisheye lens on it. This means that we also simulate the Vignette of the Cameras. As you might be aware, most fisheye lenses produce a pronounced variation of brightness, more information can be found Here: (non affiliated link!)

The good news is that the variance in brightness is static. It could be reduced by creating a gradient (by inverting an image similar to this and multiplying it with the ASE image.

I hope that helps! Hi! Is there any specific method to compute this relative illumination value of each pixel? I'm not familiar with fisheye cameras.

captain-sysadmin commented 1 month ago

let me see if I can generate a gradient, standby!

thucz commented 1 month ago

Hi! Do you have any clue about relative illumination computation?

captain-sysadmin commented 1 month ago

We calculate the distortion and then apply a Vignette, so the relative illumination is a function of combining a "normal" but distorted image with a vignette image. invert-vignette

This should re-flatten the lens based roll off.

The top right of the image is "up" so depending on how its applied you might need to rotate it to line it up.

thucz commented 1 month ago

Thanks for your help. I previously wrote the code about preprocessing ASE fisheye data including undistort and rotation. I beg your help. Which line code should I revise to change the relative illumination?

import matplotlib.colors as colors
import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
from pathlib import Path
import os
from PIL import Image
from scipy.spatial.transform import Rotation as R
from projectaria_tools.projects import ase
from projectaria_tools.core import data_provider, calibration
from projectaria_tools.core.image import InterpolationMethod
from readers import read_points_file, read_trajectory_file, read_language_file
import cv2
from tqdm import tqdm
import os, sys, json
from multiprocessing import Pool

def distance_to_depth(K, dist, uv=None):
    if uv is None and len(dist.shape) >= 2:
        # create mesh grid according to d
        uv = np.stack(np.meshgrid(np.arange(dist.shape[1]), np.arange(dist.shape[0])), -1)
        uv = uv.reshape(-1, 2)
        dist = dist.reshape(-1)
        if not isinstance(dist, np.ndarray):
            import torch
            uv = torch.from_numpy(uv).to(dist)
    if isinstance(dist, np.ndarray):
        # z * np.sqrt(x_temp**2+y_temp**2+z_temp**2) = dist
        uvh = np.concatenate([uv, np.ones((len(uv), 1))], -1)
        uvh = uvh.T # N, 3
        temp_point = np.linalg.inv(K) @ uvh # 3, N  
        temp_point = temp_point.T # N, 3
        z = dist / np.linalg.norm(temp_point, axis=1)
    else:
        uvh = torch.cat([uv, torch.ones(len(uv), 1).to(uv)], -1)
        temp_point = torch.inverse(K) @ uvh
        z = dist / torch.linalg.norm(temp_point, dim=1)
    return z

def transform_3d_points(transform, points):
    N = len(points)
    points_h = np.concatenate([points, np.ones((N, 1))], axis=1)
    transformed_points_h = (transform @ points_h.T).T
    transformed_points = transformed_points_h[:, :-1]
    return transformed_points

def aria_export_to_scannet(scene_id):
    src_folder = Path("/group/40033/public_datasets/3d_datasets/aria/ase_data/"+str(scene_id))
    trgt_folder = Path("/group/40033/public_datasets/3d_datasets/aria/ase_preprocessed_data/"+str(scene_id))
    trgt_folder.mkdir(parents=True, exist_ok=True)
    SCENE_ID = src_folder.stem
    print("SCENE_ID:", SCENE_ID)

    scene_max_depth = 0
    scene_min_depth = np.inf
    Path(trgt_folder, "intrinsic").mkdir(exist_ok=True)
    Path(trgt_folder, "pose").mkdir(exist_ok=True)
    Path(trgt_folder, "depth").mkdir(exist_ok=True)
    Path(trgt_folder, "color").mkdir(exist_ok=True)

    rgb_dir = src_folder / "rgb"
    depth_dir = src_folder / "depth"
    # Load camera calibration
    device = ase.get_ase_rgb_calibration()
    # Load the trajectory using read_trajectory_file() 
    trajectory_path = src_folder / "trajectory.csv"
    trajectory = read_trajectory_file(trajectory_path)

    num_frames = len(list(rgb_dir.glob("*.jpg")))
    Path('./debug').mkdir(exist_ok=True)
    for frame_idx in range(num_frames):   
        frame_id = str(frame_idx).zfill(7)
        rgb_path = rgb_dir / f"vignette{frame_id}.jpg"
        depth_path = depth_dir / f"depth{frame_id}.png"
        depth = Image.open(depth_path) # uint16        
        rgb = cv2.imread(str(rgb_path), cv2.IMREAD_UNCHANGED)
        depth = np.array(depth)
        scene_min_depth = min(depth.min(), scene_min_depth)
        inf_value = np.iinfo(np.array(depth).dtype).max
        depth[depth == inf_value] = 0 # consider it as invalid, inplace with 0
        T_world_from_device = trajectory["Ts_world_from_device"][frame_idx] # camera-to-world
        assert device.get_image_size()[0] == 704
        # https://facebookresearch.github.io/projectaria_tools/docs/data_utilities/advanced_code_snippets/image_utilities
        pinhole = calibration.get_linear_camera_calibration(
            # device.get_image_size()[0],
            # device.get_image_size()[1],
            # device.get_focal_lengths()[0],
            512,
            512,
            150,
            "camera-rgb",
            device.get_transform_device_camera() # important to get correct transformation matrix in pinhole_cw90
            )
        # distort image
        rectified_rgb = calibration.distort_by_calibration(np.array(rgb), pinhole, device, InterpolationMethod.BILINEAR)
        # raw_image = np.array(depth) # Will not work
        depth = np.array(depth).astype(np.float32) # WILL WORK
        rectified_depth = calibration.distort_by_calibration(depth, pinhole, device)

        rotated_image = np.rot90(rectified_rgb, k=3)
        rotated_depth = np.rot90(rectified_depth, k=3)
        increase_light = True
        if increase_light:
            rotated_image = cv2.cvtColor(rotated_image,cv2.COLOR_BGR2HSV)
            h,s,v = cv2.split(rotated_image)      
            v1 = np.clip(cv2.add(1*v, 30), 0, 255)
            rotated_image = np.uint8(cv2.merge((h,s,v1)))
            rotated_image = cv2.cvtColor(rotated_image,cv2.COLOR_HSV2BGR)

        cv2.imwrite(str(Path(trgt_folder, "color", f"{frame_id}.jpg")), rotated_image)
        # TODO: check this
        plt.imsave(Path(f"./debug/debug_undistort_{frame_id}.png"), np.uint16(rotated_depth), cmap="plasma")
        # Get rotated image calibration
        pinhole_cw90 = calibration.rotate_camera_calib_cw90deg(pinhole)
        principal = pinhole_cw90.get_principal_point()
        cx, cy = principal[0], principal[1]
        focal_lengths = pinhole_cw90.get_focal_lengths()
        fx, fy = focal_lengths 
        K = np.array([ # camera-to-pixel
            [fx, 0, cx],
            [0, fy, cy],
            [0, 0, 1.0]])

        c2w = T_world_from_device 
        c2w_rotation = pinhole_cw90.get_transform_device_camera().to_matrix()
        c2w_final = c2w @ c2w_rotation   # right-matmul!
        cam2world = c2w_final
        # distance-to-depth
        rotated_depth = distance_to_depth(K, rotated_depth).reshape((rotated_depth.shape[0], rotated_depth.shape[1]))#.reshape((dpt.shape[0], dpt.shape[1]))        
        rotated_depth = np.uint16(rotated_depth)

        cv2.imwrite(str(Path(trgt_folder, "depth", f"{frame_id}.png")), rotated_depth) # cmap="gray", vmin=0, vmax=255
        scene_max_depth = max(scene_max_depth, float(depth.max()))
        Path(trgt_folder, "min_depth.txt").write_text(f"{scene_min_depth * 1.0 / 1000}")                
        Path(trgt_folder, "max_depth.txt").write_text(f"{scene_max_depth * 1.0 / 1000}")
        Path(trgt_folder, "intrinsic", "intrinsic_color.txt").write_text(f"""{K[0][0]} {K[0][1]} {K[0][2]} 0.00\n{K[1][0]} {K[1][1]} {K[1][2]} 0.00\n{K[2][0]} {K[2][1]} {K[2][2]} 0.00\n0.00 0.00 0.00 1.00""")
        Path(trgt_folder, "pose", f"{frame_id}.txt").write_text(f"""{cam2world[0, 0]} {cam2world[0, 1]} {cam2world[0, 2]} {cam2world[0, 3]}\n{cam2world[1, 0]} {cam2world[1, 1]} {cam2world[1, 2]} {cam2world[1, 3]}\n{cam2world[2, 0]} {cam2world[2, 1]} {cam2world[2, 2]} {cam2world[2, 3]}\n0.00 0.00 0.00 1.00""")

if __name__ == "__main__":    
    aria_export_to_scannet(scene_id=0)
captain-sysadmin commented 1 month ago

multiply it together with the rgb image just as its loaded:

        rgb = cv2.imread(str(rgb_path), cv2.IMREAD_UNCHANGED)
        anti_vignette = cv2.imread('path_to_anti_vignette.jpg')
        rgb = cv2.multiply(rgb,anti_vignette,scale=1.0)

that should flatten it out. (again, i'm not sure of the rotation, so you might need to rotate the anit-vignette image left by 90 degrees for it to line up properly. )

You might end up with a white border instead of a black border, but that shouldn't be too hard to remove if needed (you can either crop or change the anti-vignette image I provided.)

thucz commented 1 month ago

Many thanks!

thucz commented 1 month ago

We calculate the distortion and then apply a Vignette, so the relative illumination is a function of combining a "normal" but distorted image with a vignette image. invert-vignette

This should re-flatten the lens based roll off.

The top right of the image is "up" so depending on how its applied you might need to rotate it to line it up.

Hi! it seems that this anti-vignette is normalized(min value is 0 and max value is 255 with data type np.uint8). I use it to change RGB images but the color overflows. Could you tell me how to reverse it to a true value?

debug

import matplotlib.colors as colors
import matplotlib.pyplot as plt
import numpy as np
# import plotly.graph_objects as go
from pathlib import Path
import os
from PIL import Image
import cv2
import os, sys, json
scene_id = 0
vignette_path = Path("/group/40033/public_datasets/3d_datasets/aria/data/anti_vignette.png")
anti_vignette = cv2.imread(str(vignette_path)) # , cv2.IMREAD_UNCHANGED

src_folder = Path("/group/40033/public_datasets/3d_datasets/aria/ase_data/"+str(scene_id))
rgb_dir = src_folder / "rgb"

frame_idx = 0
frame_id = str(frame_idx).zfill(7)
rgb_path = rgb_dir / f"vignette{frame_id}.jpg"
rgb = cv2.imread(str(rgb_path), cv2.IMREAD_UNCHANGED)

rgb = cv2.multiply(rgb, anti_vignette,scale=1.0)
cv2.imwrite("./debug.jpg", rgb)
thucz commented 1 month ago

Hi! @captain-sysadmin It seems that the given anti-vignette is normalized(min value is 0 and max value is 255 with data type np.uint8). If I use this anti-vignette to multiply RGB image, the RGB image will overflow (the value exceeds the data range of [0, 255]). Do you know how to resolve it?

thucz commented 4 weeks ago

Hi! Sorry to bother you again. Do you have any clue about this problem? It means a lot to me.

captain-sysadmin commented 4 weeks ago

Hello!

As you can see, because we are multiplying white(or very near white) with another colour other than black, we quickly overflow and clip.

You can try using cv2.addWeighted instead of multiply. So:

rgb = cv2.multiply(rgb, anti_vignette,scale=1.0)

becomes

alpha = 1.0
beta = 1.0
gamma = 1.0
rgb = cv2.addWeighted(rgb, alpha, anti_vignette, beta, gamma)

changing the alpha and beta would allow you to alter the mix between the two images, and gamma should allow you to control clipping

thucz commented 4 weeks ago

Thanks for your reply! I can get the normal image. debug (1)

But I still have a question. It seems the given code above will re-weight all three channels (RGB) with the anti-vignette image. So the image looks too bright.

I tried to re-weight the image in HSV space and revise only the V value. The brightness becomes normal. But it will get a checkboard artifact near the edge. Do you know how to alleviate this problem?

debug (2)

rgb = rgb.astype(np.float32) / 255           # go to 32-bit float on 0..1
anti_vignette = anti_vignette.astype(np.float32) / 255
new_rgb = cv2.cvtColor(rgb,cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(new_rgb)
new_v = cv2.addWeighted(v, alpha, anti_vignette[:, :, 0], beta, gamma)
new_rgb = cv2.merge((h,s,new_v))
new_rgb = cv2.cvtColor(new_rgb, cv2.COLOR_HSV2BGR)
new_rgb = np.uint8(np.clip(new_rgb*255, 0, 255))
rgb = new_rgb
captain-sysadmin commented 3 weeks ago

I tried to re-weight the image in HSV space and revise only the V value. The brightness becomes normal. But it will get a checkboard artifact near the edge. Do you know how to alleviate this problem?

Off the top of my head you might be able to lower the V of vignette before adding it to the RGB image?

thucz commented 3 weeks ago

Each channel of anti_vignette is equal (R=G=B). So V=max(R, G, B) of vignette is vignette[:, :, 0].

anti_vignette =  np.uint8(anti_vignette.astype(np.float32) * 1.0 / 3.0)
rgb = cv2.addWeighted(rgb, alpha, anti_vignette , beta, gamma)

The result is still a little white.

debug (3)