ManifoldRG / MultiNet

MIT License
11 stars 1 forks source link

Research VLM to Control mapping #84

Closed pranavguru closed 3 months ago

pranavguru commented 3 months ago

Research (both literature review and architectural scoping) around utilizing VLMs for Control fine-tuning and inference.

devjwsong commented 3 months ago

Possible ways:

  1. JSON output with corresponding key-value pairs.
  2. Simple textual expression. (e.g. "put a ball into a bowl")
  3. Python code. (e.g. grab(object))
  4. Reserving special action tokens. (e.g. RT-2)
devjwsong commented 3 months ago

Some thoughts:

  1. First, we should choose a way without architectural changes. This means that we try a prompt-based way.
    • We should be able to evaluate the model's performance without fine-tuning.
    • In fact, we cannot even hack the model or vocabulary because some VLMs are closed-sourced.
  2. The output should be easy to parse and interpret.
    • JSON: simple, easy to parse, interpretable.
    • Python: useful when executing an actual robot arm.
    • Text: less formatted, hard to control.
  3. If we choose the JSON format, the actions are represented:
    • Discrete action spaces:
      • DM Lab: {distance: far, reward: positive}
      • Atari: {action: leftfire} or just a category number given the list.
    • Continuous action spaces:
      • MuJoCo: {thigh_joint: 0.3579, leg_joint: -0.424, foot_joint: 0.987}
      • DM Control Suite: {arm_root: 0.465, arm_wrist: -0.985, ...}
  4. The steps of inference/fine-tuning:
    • We design a prompt to specify how the response should look like and which key and value should be included as specific as possible per dataset.
    • Then we put this dataset-specific instruction when inferencing with a model.
    • Then we additionally parse this JSON format response into actual action values to update the environment.
    • When fine-tuning, we add this instruction in our fine-tuning data so that the model can follow the specific form of generation.
devjwsong commented 3 months ago

Reference literatures:

devjwsong commented 3 months ago

Moving onto the testing.

85 #86 #87