SamsungLabs / imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
MIT License
274 stars 29 forks source link

General Questions - ImVoxelNet with custom dataset #52

Closed Steven-m2ai closed 2 years ago

Steven-m2ai commented 2 years ago

Hello,

Thank you for your work. I have a few questions I wish to have clarified. Context: I am creating a dataset in SUN-RGBD format, and so I would like to understand the format structure.

  1. Looks like the "calib" file (once you run the matlab files in SUN-RGBD folder) contains two rows. The first, is the camera extrinsic. However, it is named "Rt" which in my mind should be a 3x4 matrix, but it is stored as a column-major 3x3 matrix. Which coordinates system does this extrinsic parameter transform? From what I understand it rotates from depth coordinate system to camera coordinate system. Then in the ground truth labeling the translation and yaw angle will take care of the bounding box position and orientation. Is this understanding correct?

  2. In MMDetection3D there is a "browse_dataset" file that allows you to view your ground truths of your dataset to confirm it is correct before training. I was wondering if there is one for the SUN-RGBD in ImVoxelNet, as it would be helpful to see if my custom labels in SUN-RGBD format is correct.

  3. I am trying to use the Dockerfile provided, however my machine runs CUDA version 11.1 (RTX 3090 so from my understanding i cannot downgrade to 10.1), which means pytorch>=1.8.0. I change the mmcv-full and mmdet to compatible, most recent versions, but i run into Runtime error "... is not complied with GPU support". Any suggestions here to make the Dockerfile compatible with cuda 11.1? (Running with provided dockerfile gives "CUDA error: no kernel image is available for execution on the device")

Again, thank you for your time!

filaPro commented 2 years ago

Hi @Steven-m2ai ,

  1. The only place, where this matrix is used is here. You can somehow check that the 3d points are mapped to the correspornding image pixels here.

  2. I probably have never tried browse_dataset. As our visualization script is correct, you can use it to visualize your ground truth boxes instead of our predicted.

  3. Can you share your Dockerfile? I think something like this should be fine.

    FROM pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
    ...
    RUN pip install mmcv-full==1.2.7 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
    ...
Steven-m2ai commented 2 years ago

Hello @filaPro Thank you for your timely response. Appreciate that.

  1. I will look into the code here and let you know if I have another question once I spend a little time looking into the code.
  2. Do you mean use python tools/test.py {CONFIG} {CHECKPOINT} --eval 'mAP' --options 'show=True' 'out_dir={OUT_DIR}' I run this but the results in my output folder do not have ground truth boxes in the images. Are you referring to a different script to run visualizations? I refer to here, which notes that GT visualizations can be seen through this command.
  3. The Dockerfile suggestion you gave works great. Thank you so much for this. I thought the RTX3090 only supported CUDA versions starting at 11.1, so I didn't think to try 11.0. Let me know if you would like to see the full Dockerfile.

Thanks again!

filaPro commented 2 years ago
  1. Have you tried the command from visualization section of our readme.md?
Steven-m2ai commented 2 years ago

Hello @filaPro

  1. Yes, I ran this command initially, python tools/test.py ./configs/imvoxelnet/imvoxelnet_sunrgbd_fast.py ./checkpoint/20211007_105255.pth --show --show-dir ./vis_results but to no avail. The results inside the "vis_results" directory I created look like predictions only (as shown below) Screenshot from 2022-06-08 10-24-46

    1. I ran training using the checkpoint file provided under "SUN-RGBD 10 from VoteNet classes V3" The config file I used is sunrgbd_fast. The github shows it produces a mAP of 40.7, but my training results after 12 epochs is 0.2662 as shown below Screenshot from 2022-06-08 10-22-09 I wonder where the discrepancy happens.
filaPro commented 2 years ago
  1. Yes there are only predictions for now, but you can probably visualize ground truth boxes in the same coordinate system with the same function.

  2. Are you training on 8 GPUs?

Steven-m2ai commented 2 years ago
  1. Yes, I looked into the code and modified the test script with a show_gt flag. Looks like the same function can produce ground truth well. Screenshot from 2022-06-10 10-27-55

  2. I am training on 2 x RTX3090s. Do you mean that the number of GPUs affect the mAP? I read in your paper you use 8 Nvidia Tesla P40s.

filaPro commented 2 years ago
  1. I think yes. Or you need to finetune batch size and learning rate shedule when changing from 8 GPUs to 2.
Steven-m2ai commented 2 years ago
  1. Okay, sounds good. I will investigate this. Thanks for your time it has been very helpful. I will close this issue.