VoxAct-B / voxactb

VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation (CoRL 2024)
https://voxact-b.github.io/
MIT License
13 stars 0 forks source link

What is the training platform and how long did the training take? #3

Closed yanrihong closed 2 weeks ago

yanrihong commented 2 weeks ago

Dear authors, I have read your paper "VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation" with great interest. The work presented is very impressive. I have a few questions regarding the training process: In your paper, it is mentioned that "We train the policy with a batch size of 1 on a single Nvidia 3000 series GPU for two days."

  1. Does this mean that only the policy part needs to be trained?
  2. What platform was used for training the models? and How long did the training process take?
  3. Can a single 4090 GPU handle the entire workflow of the training process?
  4. It is noted that a related work, peract, used a batch - size of 16 on 8 NVIDIA V100 GPUs for 16 days. I am wondering if there are any similarities or differences in the training setup for VoxAct - B.

Thank you for your time and for sharing your research. Looking forward to your response. best

arthur801031 commented 2 weeks ago

Thank you for your kind words!

  1. Correct. For example, if you want to train VoxAct-B on Open Jar, you can use these scripts, documented on README:

    ./train_open_jar_ours_vlm_10_demos_v2_11_acting.sh
    ./train_open_jar_ours_vlm_10_demos_v2_11_stabilizing.sh
  2. We used Ubuntu 22.04. It took two days to train one acting policy and one stabilizing policy in parallel (two separate GPUs).

  3. Yes, but it would be slower since you would train the acting and stabilizing policies on a single GPU.

  4. The acting and stabilizing policies are based on the PerAct architecture, but we only use 50^3 voxels instead of the 100^3 voxels used in PerAct, so training VoxAct-B is much faster than training PerAct. Please refer to the Appendix for more details.