TRAILab / PDV

Point Density-Aware Voxels for LiDAR 3D Object Detection (CVPR 2022)
Apache License 2.0
158 stars 29 forks source link

What should I do if I want to train this model on my own dataset? #4

Closed KyleYueye closed 2 years ago

KyleYueye commented 2 years ago

I notice that for each dataset, info files (.pkl) are generated. Do you have any rules about generating these info files? Because I want to train this model on my own dataset. Thanks a lot.

jskhu commented 2 years ago

It depends on the dataset, but typically the pickle files contain labels, calibration, and basic metadata. For example, KITTI's train pickle file contains the following information for each training sample:

>>> import pickle
>>> import pprint
>>> pp = pprint.PrettyPrinter()
>>> with open('kitti_infos_train.pkl', 'rb') as f:
...     data = pickle.load(f)
...
>>> pp.pprint(data[0])
{'annos': {'alpha': array([-0.2]),
           'bbox': array([[712.4 , 143.  , 810.73, 307.92]], dtype=float32),
           'difficulty': array([0], dtype=int32),
           'dimensions': array([[1.2 , 1.89, 0.48]]),
           'gt_boxes_lidar': array([[ 8.73138046, -1.85591757, -0.65469939,  1.2       ,  0.48      ,
         1.89      , -1.58079633]]),
           'index': array([0], dtype=int32),
           'location': array([[1.84, 1.47, 8.41]], dtype=float32),
           'name': array(['Pedestrian'], dtype='<U10'),
           'num_points_in_gt': array([377], dtype=int32),
           'occluded': array([0.]),
           'rotation_y': array([0.01]),
           'score': array([-1.]),
           'truncated': array([0.])},
 'calib': {'P2': array([[ 7.07049316e+02,  0.00000000e+00,  6.04081421e+02,
         4.57583084e+01],
       [ 0.00000000e+00,  7.07049316e+02,  1.80506607e+02,
        -3.45415711e-01],
       [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00,
         4.98101581e-03],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         1.00000000e+00]]),
           'R0_rect': array([[ 0.9999128 ,  0.01009263, -0.00851193,  0.        ],
       [-0.01012729,  0.9999406 , -0.00403767,  0.        ],
       [ 0.00847067,  0.00412352,  0.9999556 ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  1.        ]],
      dtype=float32),
           'Tr_velo_to_cam': array([[ 0.00692796, -0.99997222, -0.00275783, -0.02457729],
       [-0.00116298,  0.00274984, -0.99999553, -0.06127237],
       [ 0.99997532,  0.00693114, -0.0011439 , -0.33210289],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])},
 'image': {'image_idx': '000000',
           'image_shape': array([ 370, 1224], dtype=int32)},
 'point_cloud': {'lidar_idx': '000000', 'num_features': 4}}

Unless your data is extremely different, my suggestion would be to convert your data to a KITTI-like format so you can use OpenPCDet's dataset framework easily. An example Waymo to KITTI converter is here: https://github.com/caizhongang/waymo_kitti_converter. You can probably do something similar with your dataset.

KyleYueye commented 2 years ago

It depends on the dataset, but typically the pickle files contain labels, calibration, and basic metadata. For example, KITTI's train pickle file contains the following information for each training sample:

>>> import pickle
>>> import pprint
>>> pp = pprint.PrettyPrinter()
>>> with open('kitti_infos_train.pkl', 'rb') as f:
...     data = pickle.load(f)
...
>>> pp.pprint(data[0])
{'annos': {'alpha': array([-0.2]),
           'bbox': array([[712.4 , 143.  , 810.73, 307.92]], dtype=float32),
           'difficulty': array([0], dtype=int32),
           'dimensions': array([[1.2 , 1.89, 0.48]]),
           'gt_boxes_lidar': array([[ 8.73138046, -1.85591757, -0.65469939,  1.2       ,  0.48      ,
         1.89      , -1.58079633]]),
           'index': array([0], dtype=int32),
           'location': array([[1.84, 1.47, 8.41]], dtype=float32),
           'name': array(['Pedestrian'], dtype='<U10'),
           'num_points_in_gt': array([377], dtype=int32),
           'occluded': array([0.]),
           'rotation_y': array([0.01]),
           'score': array([-1.]),
           'truncated': array([0.])},
 'calib': {'P2': array([[ 7.07049316e+02,  0.00000000e+00,  6.04081421e+02,
         4.57583084e+01],
       [ 0.00000000e+00,  7.07049316e+02,  1.80506607e+02,
        -3.45415711e-01],
       [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00,
         4.98101581e-03],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         1.00000000e+00]]),
           'R0_rect': array([[ 0.9999128 ,  0.01009263, -0.00851193,  0.        ],
       [-0.01012729,  0.9999406 , -0.00403767,  0.        ],
       [ 0.00847067,  0.00412352,  0.9999556 ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  1.        ]],
      dtype=float32),
           'Tr_velo_to_cam': array([[ 0.00692796, -0.99997222, -0.00275783, -0.02457729],
       [-0.00116298,  0.00274984, -0.99999553, -0.06127237],
       [ 0.99997532,  0.00693114, -0.0011439 , -0.33210289],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])},
 'image': {'image_idx': '000000',
           'image_shape': array([ 370, 1224], dtype=int32)},
 'point_cloud': {'lidar_idx': '000000', 'num_features': 4}}

Unless your data is extremely different, my suggestion would be to convert your data to a KITTI-like format so you can use OpenPCDet's dataset framework easily. An example Waymo to KITTI converter is here: https://github.com/caizhongang/waymo_kitti_converter. You can probably do something similar with your dataset.

Thank you