Scope out eval specifications

ManifoldRG / MultiNet

Apache License 2.0

18 stars 1 forks source link

Scope out eval specifications #173

Closed pranavguru closed 1 month ago

pranavguru commented 1 month ago

Answer the following questions:

How is OpenVLA evaluated in the original paper?
How can we adapt it for zero-shot?
What are the datasets the original work covers in its evals?
What are the model specs during evaluation? Full-scale or quantized?

Locke0 commented 1 month ago

How is OpenVLA evaluated in the original paper?

Full Size 7B OpenVLA model was evaluated on the WidowX robot and Google Robot
WidowX 17 tasks across 5 categories:
1. Visual generalization (5 tasks)
2. Motion generalization (2 tasks)
3. Physical generalization (3 tasks)
4. Semantic generalization (4 tasks)
5. Language grounding (3 tasks)

Google Robot 12 tasks with 5 rollouts each (last 7 tasks out of distribution)

Fine-tuning experiments

"We fine-tune OpenVLA on 10-100 demonstrations across 7 Franka Emika Panda tasks, ranging from single-instruction tasks to diverse multi-instruction tasks"

Fine-tuned OpenVLA 7B LIBERO model evaluated on LIBERO dataset specifically

Locke0 commented 1 month ago

How can we adapt it for zero-shot?

Just need to modify transform.py and dataset_statistics.json to add new normalization and action space mapping configs

What are the datasets the original work covers in its evals?

Evaluated on real robots except that the fine-tuned 7B OpenVLA LIBERO model was evaluated on LIBERO using mujoco simulation

What are the model specs during evaluation? Full-scale or quantized?

Full-scale