facebookresearch / home-robot

Mobile manipulation research tools for roboticists
MIT License
944 stars 132 forks source link

Mobile manipulation #379

Closed sriharshakoribille closed 1 year ago

sriharshakoribille commented 1 year ago

Hello,

Thank you for sharing your amazing research! Great work!

In regard to your paper on SLAP, can you please elaborate on how PerAct and SLAP were deployed with mobile manipulation? Specifically for PerAct, was the voxel grid size increased to cover the entire environment? And what are the camera views used to collect the data?

cpaxton commented 1 year ago

@zephirefaith can comment on this We kept most parameters the same, are you referring to Stretch or Franka experiments? For Stretch we used the head camera For Franka we used a camera on the end of the arm, and just had it move to a couple predefined "look" poses on one side of a table

sriharshakoribille commented 1 year ago

I was curious about the experiments on Stretch (Table 3 in the paper), since they involved mobile manipulation. Since the stretch robot would be moving, for PerAct would the input voxel grid be static w.r.t to the world or would it be dynamic moving with the robot. And if the voxel grid dimensions were increased to cover the mobile manipulation scene

cpaxton commented 1 year ago

It's static wrt the robot base, so it will always be a fixed volume near the robot. Unfortunately we didn't end up doing whole scenes, although that's part of the plan - if you're interested in that direction, i think 3d-llm has a similar architecture with a perceiver backbone: https://vis-www.cs.umass.edu/3dllm/

So this seems to indicate the approach will scale.

You would/should apply this to a voxel grid in world coordinates if you wanted to do this.

sriharshakoribille commented 1 year ago

Ah okay. Thank you for the information and additional directions! @cpaxton

zephirefaith commented 1 year ago

@sriharshakoribille Glad to know you found our research useful! Thanks for responding @cpaxton, but to add some more detail: on Stretch we changed the voxel volume of PerAct's input to be 1.5mx1.5mx1.5m which is coarser than the one used for Franka. This is both due to the different embodiment and unconstrained scene geometry wrt robot's base.

Our experiments use PerAct in an open-loop manner, where every prediction is with respect to the 1st scene and 1st position of the end-effector wrt base-frame. Hope that clarifies things some. Please do not hesitate to reach out again and tag me for details.