ManifoldRG / MultiNet

MIT License
10 stars 1 forks source link

Scope out a path for profiling OpenVLA #149

Open pranavguru opened 2 weeks ago

pranavguru commented 2 weeks ago

Overview and actionables on how to go about the OpenVLA profiling effort. What is present in the codebase? What needs to be modified? What needs to be built in order to profile it on control and VL tasks?

Locke0 commented 1 week ago

OpenX datasets used by OpenVLA and MultiNet v0 comparison:

OpenVLA: image Notes:

  1. They removed the 10% DROID dataset in the final third of the training for the final model due to low action token accuracy
  2. The training data only contains manipulation datsets with at least one 3rd person camera and use single-arm end-effector control
  3. Follows Octo and up-weights larger tasks with scene diversity and down-weights less diverse datasets
  4. Input 224 x 224px image and text instruction, output 7D action space

MutliNet v0: https://github.com/ManifoldRG/MultiNet/issues/45

I think the next steps would be finding appropriate datasets from MultiNet v0 OpenX for simple eval on OpenVLA