ZhanYang-nwpu / Mono3DVG

[AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024
22 stars 1 forks source link

About retest record #4

Closed jjlinghu closed 6 months ago

jjlinghu commented 6 months ago

Thanks to the authors for releasing the code quickly! I have retested the code and completely repeated the performance that the author declares in the paper. image

There are a few things to note:

  1. When I load the pre_trained model checkpoint_best.pth, https://github.com/ZhanYang-nwpu/Mono3DVG/blob/da207e5bf216b707370a5175133e11e6614c4d30/lib/helpers/save_helper.py#L80 I miss the _RuntimeError: Unexpected key(s) in state_dict: "text_encoder.embeddings.positionids". By carefully comparing the model and the weight parameters, I didn't see any difference. This phenomenon makes me confused. Anyway, I solved this issue by setting strict=False and it can't decline the performance.
  2. Some np.float, and np.bool need to be changed to np.float64, and np.bool_ if the version of NumPy > 1.20
  3. It seems that not only MonoRefer dataset is used, we may utilize the raw KITTI/training/calib by soft link. https://github.com/ZhanYang-nwpu/Mono3DVG/blob/da207e5bf216b707370a5175133e11e6614c4d30/lib/datasets/mono3drefer/mono3drefer_dataset.py#L68
  4. The file path may need to be modified appropriately.

Thanks to the authors for their contributions to the community!

ZhanYang-nwpu commented 6 months ago

Thanks to the authors for releasing the code quickly! I have retested the code and completely repeated the performance that the author declares in the paper. image

There are a few things to note:

  1. When I load the pre_trained model checkpoint_best.pth, https://github.com/ZhanYang-nwpu/Mono3DVG/blob/da207e5bf216b707370a5175133e11e6614c4d30/lib/helpers/save_helper.py#L80

    I miss the _RuntimeError: Unexpected key(s) in state_dict: "text_encoder.embeddings.positionids". By carefully comparing the model and the weight parameters, I didn't see any difference. This phenomenon makes me confused. Anyway, I solved this issue by setting strict=False and it can't decline the performance.

  2. Some np.float, and np.bool need to be changed to np.float64, and np.bool_ if the version of NumPy > 1.20
  3. It seems that not only MonoRefer dataset is used, we may utilize the raw KITTI/training/calib by soft link. https://github.com/ZhanYang-nwpu/Mono3DVG/blob/da207e5bf216b707370a5175133e11e6614c4d30/lib/datasets/mono3drefer/mono3drefer_dataset.py#L68
  4. The file path may need to be modified appropriately.

Thanks to the authors for their contributions to the community!

I have a lot of things going on recently so I'm in a hurry to organize the code. Thank you very much for reminding me. I will check the code again.

ZhanYang-nwpu commented 6 months ago
  1. Yes, you are right. We need to set strict=False, it won't affect anything.
  2. My Numpy version is 1.23.4, so I don't have this problem.
  3. Our mono3drefer_dataset.py does have a channel for calib data loading. We need to use a camera calibration parameter calib.P2. I have now solved the problem and uploaded our calib.zip in Mono3DRefer folder.
jjlinghu commented 6 months ago

It seems that the Pre-trained model (MonoDETR) is destroyed when uploads.

ZhanYang-nwpu commented 6 months ago

It seems that the Pre-trained model (MonoDETR) is destroyed when uploads.

Yes, it could be because of the network. I'll fix this right now.

ZhanYang-nwpu commented 6 months ago

It seems that the Pre-trained model (MonoDETR) is destroyed when uploads.

This issue has been resolved.