ai4ce / SSCBench

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
166 stars 11 forks source link

Matching frames between SSCBench and KITTI-360 to obtain stereo pairs #11

Closed npurson closed 1 year ago

npurson commented 1 year ago

I am trying to reproduce the results of VoxFormer using the SSCBench dataset. However, stereo image pairs are not provided in SSCBench.

To obtain stereo pairs, I tried matching frames between SSCBench and the original KITTI-360 dataset. However, I found that the frame ID ranges do not align between the two datasets. For example, sequence 0 in SSCBench contains frames 0 - 10,482, while sequence 0 in KITTI-360 contains frames 0 - 11,517.

Furthermore, frames with the same ID across the datasets do not appear to visually match. For instance, frame 0 matches across the mentioned sequences, but frame 10,482 from each sequence is clearly not the same scene.

Could someone please advise how I could match the frames to extract consistent stereo pairs for SSCBench sequences? Any guidance would be greatly appreciated.

freemty commented 1 year ago

I faced similar issues while attempting to load the paired camera pose for the RGB image 0 on SSC Bench KITTI-360.🥹

ahayler commented 1 year ago

I believe SSCBench is using a subset of the frames of each sequence from KITTI360. While this is far from a perfect solution I just determined if two ids match each other if the associated RGB values are nearly the same. As long as we assume that the images in both SSCBench and KITTI360 are in sequential order this process is pretty straightforward.

Here is my code:

import pickle
from pathlib import Path
import cv2
import numpy as np
from argparse import ArgumentParser
from subprocess import call

from tqdm import tqdm

def mse(img1, img2):
    h, w = img1.shape
    diff = cv2.subtract(img1, img2)
    err = np.sum(diff ** 2)
    mse = err / (float(h * w))
    return mse

def read(path):
    img = cv2.imread(str(path))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    return img

def main():
    parser = ArgumentParser()
    parser.add_argument("--from_path", "-f", type=str)
    parser.add_argument("--to_path", "-t", type=str)

    args = parser.parse_args()

    from_path = Path(args.from_path)
    to_path = Path(args.to_path)

    curr_i = 0
    target_to_all = []

    targets = list(sorted(to_path.glob("*png")))
    curr_target = read(targets[0])

    pbar = tqdm(sorted(from_path.glob("*.png")))
    for f in pbar:
        from_img = read(f)
        while mse(from_img, curr_target) > 1e-5:
            curr_i += 1
            curr_target = read(targets[curr_i])
            pbar.set_postfix_str(f"Current target: {targets[curr_i].name}")
        target_to_all.append(curr_i)

    with open("matching.pkl", "wb") as file:
        pickle.dump(target_to_all, file)
    print("Done.")

if __name__ == "__main__":
    main()
Louis-Leee commented 1 year ago

Hi, sorry to bother you, we did some modifications during voxel generation for the kitti360 dataset, which resulted in the mismatch. The dataset has already been re-uploaded, where data_2d_raw.sqf contains image pairs, and we have updated the corresponding point cloud data in data_3d_raw.sqf, which supports depth supervision in some SSC methods, e.g. OccFormer. Thanks for your time and support.

npurson commented 1 year ago

@ahayler Thanks for your insightful feedback! Upon further consideration, I have devised a more robust solution that directly matches identical images from the original KITTI-360 image_00 set and then obtain their counterparts from image_01.

By identifying explicit image pairs, rather than relying on visual similarity, we can avoid potential errors that may have contributed to the performance degradation you observed.

For reference, I have uploaded my implementation script and matching results to https://github.com/hustvl/Symphonies/blob/main/tools/match_kitti_360_stereo.py and https://github.com/hustvl/Symphonies/blob/main/assets/kitti_360_match.txt, respectively.