NVlabs / RVT

Official Code for RVT-2 and RVT
https://robotic-view-transformer-2.github.io/
Other
261 stars 31 forks source link

Question about some implemention details #49

Closed zichunxx closed 2 weeks ago

zichunxx commented 1 month ago

Hi! Thanks for your great sharing!

I'm trying to understand how to project heatmaps across different views to predict scores for a discretized set of 3D points and choose the 3D point with the highest score. However, I have not found the corresponding code snippet.

Could you point me to the location of the corresponding implementation to help me understand the whole pipeline? Currently, I cannot understand the meaning of some abbrevations and functions, like "wpt_img".

Thanks in advance!

imankgoyal commented 1 month ago

Hi,

Here are the relevant code sections:

Multi-stage RVT calls the single stage RVT code, which calls the function inside the renderer.

Hope it helps.

Best, Ankit

zichunxx commented 1 month ago

Thank you so much @imankgoyal! It truly helps a lot.

Two more questions,

(1) what does the abbr wpt mean? In my opinion, wpt means the target translation for the end-effector.

(2) According to the get_max_3d_frm_hm_cube function, the target point with the highest score is chosen simply from all sets of heatmaps from different views. Is it necessary to filter out some duplicate points with the same coordinates before selection?

Thanks.

imankgoyal commented 1 month ago

Hi,

(1) wpt stands for waypoint. Yes, it is the target translation for the end-effector.

(2) I am not sure of the question. What do you mean by sets of heatmaps? We have a set of 3D points from which we choose the one with the highest score. The score of a 3D point is the average heatmap value of the 2D points projected on each heatmap.

Hope this helps!

imankgoyal commented 2 weeks ago

Closing because of inactivity. Please feel free to reopen if the issue persists.