SamsungLabs / imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
MIT License
283 stars 29 forks source link

About the train/val splits for SUN RGB-D dataset #49

Closed Harvey-Mei closed 2 years ago

Harvey-Mei commented 2 years ago

Hello, Thanks for your excellent work! I noticed that you have processed the annotation for SUN RGB-D to coco format, could you please tell me your data processing method and the basis of splits. I have generated the visiualzation for val part, but I cannnot find the samples showed in the paper of Total3D, Is it because you divided the data set differently?

Best, Harvey

filaPro commented 2 years ago

Hi @Harvey-Mei ,

For SUN RGB-D we support 3 benchmarks: sunrgbd, perspective_sunrgbd, total_sunrgbd. As for total_sunrgbd preprocessing I think we mainly followed their official code and just saved the results to .json.

Harvey-Mei commented 2 years ago

Hi @filaPro Thanks for your quick relpy, I'll check my data again.

Harvey-Mei commented 2 years ago

Hi @filaPro , Sorry for bothering again. I counted ImVoxelNet's division of SUN-RGBD data and found some differences.

Specifically, I use the IM3D code, which divides the data set in the same way as total3d. I modified the code of their data processing part to save the image path, and then counted these image paths with the sunrgbd_total3d* used in ImVoxelNet.

ImVoxelNet train # 4918
ImVoxelNet test # 4781
==============================
Total3D train # 5135
Total3D test # 4934
==============================
Train Not Match # 246
Test Not Match # 175

Below is the python script I used:

import os 
import pickle
import json        

imvoxel_test_path = '/home/may/nvme/code/imvoxelnet/data/sunrgbd/sunrgbd_total_infos_val.json'
imvoxel_train_path = '/home/may/nvme/code/imvoxelnet/data/sunrgbd/sunrgbd_total_infos_train.json'

total3d_test_splits = '/home/may/nvme/code/Implicit3DUnderstanding/data/sunrgbd/preprocessed/test.json'
total3d_train_splits = '/home/may/nvme/code/Implicit3DUnderstanding/data/sunrgbd/preprocessed/train.json'
total3d_root = '/home/may/nvme/code/Implicit3DUnderstanding'

# collect imvoxelnet info
imvoxelnet_test_imgs = [] 
imvoxelnet_train_imgs = []

with open(imvoxel_test_path, 'r') as f:
    imvoxelnet_test_infos = json.load(f)
with open(imvoxel_train_path, 'r') as f:
    imvoxelnet_train_infos = json.load(f)

for sample in imvoxelnet_test_infos['images']:
    img_path = sample['file_name'].split('SUNRGBD')[-1]
    if 'flip' in img_path:
        img_path = img_path.replace('flip', '')
    if img_path not in imvoxelnet_train_imgs:
        imvoxelnet_test_imgs.append(img_path)
for sample in imvoxelnet_train_infos['images']:
    img_path = sample['file_name'].split('SUNRGBD')[-1]
    if 'flip' in img_path:
        img_path = img_path.replace('_flip', '')
    if img_path not in imvoxelnet_train_imgs:
        imvoxelnet_train_imgs.append(img_path)

print("ImVoxelNet train #", len(imvoxelnet_train_imgs))
print("ImVoxelNet test #", len(imvoxelnet_test_imgs))
print("="*30)

# collect total3d info
with open(total3d_test_splits, 'r') as f:
    total3d_test_list = json.load(f)
with open(total3d_train_splits, 'r') as f:
    total3d_train_list = json.load(f)

total3d_test_imgs = []
total3d_train_imgs = []
for sample in total3d_test_list:
    sample_path = os.path.join(total3d_root, sample[2:])
    with open(sample_path, 'rb') as f:
        sample_info = pickle.load(f)
    img_path = sample_info['rgb_path'].split('SUNRGBD')[-1]
    if img_path not in total3d_test_imgs:
        total3d_test_imgs.append(img_path)

for sample in total3d_train_list:
    sample_path = os.path.join(total3d_root, sample[2:])
    with open(sample_path, 'rb') as f:
        sample_info = pickle.load(f)
    if 'flip' in sample_path:
        continue
    img_path = sample_info['rgb_path'].split('SUNRGBD')[-1]
    if img_path not in total3d_train_imgs:
        total3d_train_imgs.append(img_path)

print("Total3D train #", len(total3d_train_imgs))
print("Total3D test #", len(total3d_test_imgs))
print("="*30)

# compare difference
train_not_match_count = 0
test_not_match_count = 0
train_not_match_imgs = []
test_not_match_imgs = []
for path in total3d_train_imgs:
    if path not in imvoxelnet_train_imgs:
        train_not_match_count += 1 
        train_not_match_imgs.append(path)

for path in total3d_test_imgs:
    if path not in imvoxelnet_test_imgs:
        test_not_match_count += 1 
        test_not_match_imgs.append(path)

print("Train Not Match #", train_not_match_count)
print("Test Not Match #", test_not_match_count)
print(train_not_match_imgs)
print('-'*50)
print(test_not_match_imgs)

I'm not sure if I'm missing anything or if there's something wrong with the configuration, can you give me some advice?

filaPro commented 2 years ago

Unfortunately I can not reproduce our preprocessing for total sun rgb-d. If this difference really exists it may be a bug on our side. Hope it has not much affect on metrics.

Harvey-Mei commented 2 years ago

Yes, I think so. From my statistics, only a very small number of samples are different, so I also think that it has not much affect on metrics.

Thanks anyway!

Harvey-Mei commented 2 years ago

Hello, Thanks for your excellent work! I noticed that you have processed the annotation for SUN RGB-D to coco format, could you please tell me your data processing method and the basis of splits. I have generated the visiualzation for val part, but I cannnot find the samples showed in the paper of Total3D, Is it because you divided the data set differently?

Best, Harvey

Hello @filaPro, I noticed that many pictures in SUN RGB-D are saved in different paths, but their file names are the same. Therefore saving the visualization with a filename will cause the file with the same name to be overwritten. This may explain why the samples shown in the total3d paper are not found in the test results。 https://github.com/saic-vul/imvoxelnet/blob/87e1d5c1e9d291461c9be345836b659d98398e04/mmdet3d/datasets/dataset_wrappers.py#L125

I modify this line to out_file_name = info['img_info'][j]['filename'].split('/')[-3] and generate new visualizations.

image

Although the number of samples is still less than Total3D, the final result is equal to the number of test samples.