SamsungLabs / imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
MIT License
282 stars 29 forks source link

Why do I need to download and use KITTI velodyne data? #18

Closed chetanmreddy closed 3 years ago

chetanmreddy commented 3 years ago

Hello @filaPro

I was reading your paper and trying to implement your method on an RGB dataset that I have collected.

While trying to test your code, it looks like you also need KITTI Velodyne data to be downloaded. Does your method use lidar point cloud or else you are using point cloud dataset for some other purpose.

Thank you for sharing the code and your help.

filaPro commented 3 years ago

Hi @chetanmreddy ,

ImVoxelNet uses only a single rgb image and a pose for KITTI dataset as can be seen in our KittiMultiViewDataset.

The tools/create_data.py script for KITTI is simply copied from mmdetection3d, so it also processes Velodyne data. In our case it is not used after this step. You can probably comment all point cloud preprocessing in tools/data_converter/kitti_converter.py if you don't want to download this data.

chetanmreddy commented 3 years ago

Thank you @filaPro for your kind help. That clears my question.

I am facing an issue after doing the above steps suggested.

12

It looks like the create_groundtruth_database is using the kitti_dataset.py instead of kitti_monocular_dataset.py

Do you know why it could be happening?

Thank you again for your help

chetanmreddy commented 3 years ago

Never mind. I got it figured out. Thank you

chetanmreddy commented 3 years ago

Hi @filaPro

I am getting the following error while training:

13

My image size is 1024 X 1024. I haven't changed anything in the config file. Can you please help me? Thank you

filaPro commented 3 years ago

Can you please provide more info? Are you training on KITTI? Which config file are you using? However, the default size for KITTI is 1280 x 384, not 1024 x 1024. Also the full log file might be helpful.

chetanmreddy commented 3 years ago

My bad, I forgot to mention that I am training on my custom dataset which I have prepared according to kiiti format.

The size of my images is 1024 in my dataset.

I am using the configs/imvoxelnet/imvoxelnet_kitti.py for config file.

I am attaching my full log file from work_dirs. please find here 20210713_184140.log

filaPro commented 3 years ago

Looks like some bug with RandomFlip transform. I can have a look tomorrow. For now you can probably remove dict(type='RandomFlip') from the config.

Btw, were you able to train a model on KITTI?

chetanmreddy commented 3 years ago

Yes I was able to train on KITTI first.

Do you think it has something to do with the image size? I see that there are few values(eg img_scale) set for kitti in the config file.

Thanks for your suggestion.I will try that

filaPro commented 3 years ago

You can check mmdet/datasets/pipelines/transforms.py, line 440, mentioned in your traceback. It is about flip_ratio=None for RandomFlip.__init__. Probably flip_ratio should be 0.5. However it worked somehow on KITTI...

filaPro commented 3 years ago

Don't quite understand how you were able to train on KITTI, as I'm getting the same bug there. Fixes flip_ratio from None to 0.5 in 5fb7d07.

chetanmreddy commented 3 years ago

I think I was training your method from mmdet3d repository instead of yours a few days ago. LINK

May be that's why I didn't encounter this issue back then

chetanmreddy commented 3 years ago

Hi @filaPro I am trying to understand your visualization part of the code while testing. It works completely fine for me.

But when I was looking at the code I see that the show_results function is not implemented in mmdet3d/models/detectors/imvoxelnet.py

While running this command python tools/test.py configs/imvoxelnet/imvoxelnet_kitti.py \ work_dirs/imvoxelnet_kitti/latest.pth --show \ --show-dir work_dirs/imvoxelnet_kitti which show_results is being called?

Thank you for your help

filaPro commented 3 years ago

We overwrite show method of all monocular and multiview datasets here.