OpenXModule for zero-shot evaluation

devjwsong commented 2 months ago

Completed OpenXModule class for zero-shot evaluation.
- Each batch has multiple episodes, _process_batch function is a generator to produce the batch inputs including one timestep from each episode.
  - Since the number of timesteps in each episodes can vary, the results of _process_batch can have difference sizes if some episodes are already finished.
  - It generates the batch until all episodes reach the end.
- The processed inputs are passed to VLMModule.
- The metrics are success rate and MSE scores.
Refactored the dataloader for OpenX to remove the JAT-specific parts.
- TODO: This should be integrated into one utility class in the future.
- _TODO: text_observation and multiple vectors should be considered later._
GPT-4 sometimes generates malformed results due to the limitation of VLMs. If this happens, the action is set to just random vectors.
- TODO: Need to think about better way...

devjwsong commented 2 months ago

pranavguru commented 2 months ago

@pranavguru to fix some parts of src/data_utils/openx_dataloader.py to reflect the latest version of the OpenX dataloader.
The dataloader currently concatenates all the float tensor continuous observations in the observation dict that is part of each timestep of a given OpenX dataset. Either this concatenation should be removed or the prompt engineering framework needs to be modified to include information regarding this
Text observation data needs to be integrated into the prompt being passed to the model

ManifoldRG / MultiNet