exiawsh / StreamPETR

[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Other
545 stars 60 forks source link

Loading Custom Data to perform inference with StreamPETR #16

Closed praveenkrishna0512 closed 1 year ago

praveenkrishna0512 commented 1 year ago

Dear StreamPETR team,

We are following this documentation to try to use our custom set of images from a video stream. We want to preliminarily perform inference on the custom set of images. However, we are having trouble with creating a new dataset.

Instead, we decided to manually create our own data loader for our custom dataset. We thus want to clarify the proper way to load the image tensor from the image file path.

We will further modify the following:

We will disable the following in config:

We have also compiled further questions in this document. We hope for your reply, thank you!

exiawsh commented 1 year ago

1.Is LiDAR data very important for the model to perform depth perception? Is StreamPETR able to work without LiDAR? A: We do not use Lidar data in StreamPETR.

  1. For your stationary cameras cases, some modifications need to be made. I will try my best to explain these key data: 'img_metas': It is generated by mmdet, and 'pad_shape', 'scene_token' is used by StreamPETR. 'img': Your image input. 'lidar2img': It is quite important for StreamPETR, which means the transformation matrix from the reference coordinate system to the image plane coordinate system. 'intrinsics': Camera intrinsics matrix K, without considering distortion coefficient. 'extrinsics': Transformation matrix from the reference coordinate system to the camera coordinate system. Set it to identity matrix, if you only have one camera. 'timestamp': If you do not need to do speed estimation, you can set it as integer index, but ensure that the index difference between the two adjacent frames is the same.. 'ego_pose', 'ego_poseinv': Set them as 4x4 identity matrix. If your camera is stationary and you set with in StreamPETR head ego_ pose = False.

  2. I noticed in the report that the optimal training sequence length is 8. Where may I be able to configure this setting? Here : https://github.com/exiawsh/StreamPETR/blob/74bf34a720d91e29550391e90a2f24c5f13ab1b0/projects/configs/StreamPETR/stream_petr_r50_flash_704_bs1_8key_2grad_24e.py#L21 Notice that this config is used to sliding window training, I recommod you to perform Stream training.

praveenkrishna0512 commented 1 year ago

Thank you!

For img_metas, may I know the specifications for pad_shape and scene_token, so that I can build my custom function to generate these values? I am getting stuck when trying to extend mmdet's data loader classes.

Alternatively, may I know which package and function in mmdet is generating the img_metas? I am trying to trace the build_dataloader() function from mmdet but have yet to find the key img_metas.

May I check that the image tensor is just mmcv.imread(img)? Is there any other preprocessing of the tensor that is being done?

Also, how do we obtain the lidar2img tensors?

exiawsh commented 1 year ago

1.pad_shape: Tensor shape of the image [(h, w, 3), ] scene token: If the image comes from the same video segment, set the scene token to consistent.

  1. construct img_metas: https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/mmdet3d/datasets/pipelines/formating.py#L83
  2. Normalize the image follow:https://github.com/exiawsh/StreamPETR/blob/74bf34a720d91e29550391e90a2f24c5f13ab1b0/projects/mmdet3d_plugin/datasets/pipelines/transform_3d.py#L72
  3. lidar2img = intrinsic @ extrinsic, If you only have one camera, set extrinsic to identity matrix.
praveenkrishna0512 commented 1 year ago
test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
    dict(type='ResizeCropFlipRotImage', data_aug_conf = ida_aug_conf, training=False),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='PETRFormatBundle3D',
                collect_keys=collect_keys,
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img'] + collect_keys,
            meta_keys=('filename', 'ori_shape', 'img_shape','pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token'))
        ])
]

Thank you! Does that mean for img processing, we only need LoadMultiViewImageFromFiles and NormalizeMultiviewImage? Or do we need to follow the whole pipeline?

exiawsh commented 1 year ago

following these:
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='PadMultiViewImage', size_divisor=32), dict(type='PETRFormatBundle3D', class_names=class_names, collect_keys=collect_keys), dict(type='Collect3D', keys=[ 'img'] + collect_keys, meta_keys=('pad_shape', 'img_norm_cfg', 'scene_token'))

praveenkrishna0512 commented 1 year ago

Thanks for the help! We managed to create our own custom data loader and obtained the results.

For the standard NuScenes dataset, I see that the visualizer is implemented by NuScenes themselves, in the form of the render_sample() function.

Right now my results file has a bunch of LiDARInstance3DBoxes. How may I convert these into a bounding boxes on my image?

The format I get is the following:

LiDARInstance3DBoxes(
    tensor([[ 51.1990, -51.1989,  -5.4560,   0.9516,   1.0069,   0.9122,   0.4983,
          -0.0580,   0.0966]]))

I hope to confirm which of these are the box center, width, height, length, yaw angle etc.

exiawsh commented 1 year ago

Refer to here. https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/mmdet3d/datasets/nuscenes_dataset.py#L593

praveenkrishna0512 commented 1 year ago

Is there a way to directly plot your LiDARBoxes onto the image, without using NuScenes package? I need to use the model for non-NuScenes images, so I am looking to make a custom visualizer

exiawsh commented 1 year ago

Is there a way to directly plot your LiDARBoxes onto the image, without using NuScenes package? I need to use the model for non-NuScenes images, so I am looking to make a custom visualizer

Sorry, no…

praveenkrishna0512 commented 1 year ago

As in, is it completely not possible, or am I still able to write my own custom function to visualize this 3d_bbox?

LiDARInstance3DBoxes(
    tensor([[ 51.1990, -51.1989,  -5.4560,   0.9516,   1.0069,   0.9122,   0.4983,
          -0.0580,   0.0966]]))

I just hope to confirm what these values are. Which ones is (x, y, z), (width, length, height), yaw etc.

exiawsh commented 1 year ago

As in, is it completely not possible, or am I still able to write my own custom function to visualize this 3d_bbox?

LiDARInstance3DBoxes(
    tensor([[ 51.1990, -51.1989,  -5.4560,   0.9516,   1.0069,   0.9122,   0.4983,
          -0.0580,   0.0966]]))

I just hope to confirm what these values are. Which ones is (x, y, z), (width, length, height), yaw etc.

https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/mmdet3d/datasets/nuscenes_dataset.py#L542