We're releasing CortexBench and our first Visual Cortex model: VC-1. CortexBench is a collection of 17 different EAI tasks spanning locomotion, navigation, dexterous and mobile manipulation. We performed the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) for Embodied AI (EAI), and find that none of the existing PVRs perform well across all tasks. Next, we trained VC-1 on a combination of over 4,000 hours of egocentric videos from 7 different sources and ImageNet, totaling over 5.6 million images. We show that when adapting VC-1 (through task-specific losses or a small amount of in-domain data), VC-1 is competitive with or outperforms state of the art on all benchmark tasks.
We're open-sourcing two visual cortex models (model cards):
VC-1
| DownloadTo install our visual cortex models and CortexBench, please follow the instructions in INSTALLATION.md.
vc_models
: contains config files for visual cortex models, the model loading code and, as well as some project utilities.
cortexbench
: embodied AI downstream tasks to evaluate pre-trained representations.third_party
: Third party submodules which aren't expected to change often.data
: Gitignored directory, needs to be created by the user. Is used by some downstream tasks to find (symlinks to) datasets, models, etc.To use the VC-1 model, you can install the vc_models
module with pip. Then, you can load the model with code such as the following or follow our tutorial:
import vc_models
from vc_models.models.vit import model_utils
model,embd_size,model_transforms,model_info = model_utils.load_model(model_utils.VC1_LARGE_NAME)
# To use the smaller VC-1-base model use model_utils.VC1_BASE_NAME.
# The img loaded should be Bx3x250x250
img = your_function_here ...
# Output will be of size Bx3x224x224
transformed_img = model_transforms(img)
# Embedding will be 1x768
embedding = model(transformed_img)
To reproduce the results with the VC-1 model, please follow the README instructions for each of the benchmarks in cortexbench
.
To load your own encoder model and run it across all benchmarks, follow these steps:
<your_model>.yaml
in the model configs folder of the vc_models
module._target_
field) for loading your encoder model.Then, you can load the model as follows:
import vc_models
from vc_models.models.vit import model_utils
model, embd_size, model_transforms, model_info = model_utils.load_model(<your_model>)
embedding=<your_model>
) for each of the benchmarks in cortexbench
.If you would like to contribute to Visual Cortex and CortexBench, please see CONTRIBUTING.md.
If you use Visual Cortex in your research, please cite the following paper:
@inproceedings{vc2023,
title={Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?},
author={Arjun Majumdar and Karmesh Yadav and Sergio Arnaud and Yecheng Jason Ma and Claire Chen and Sneha Silwal and Aryan Jain and Vincent-Pierre Berges and Pieter Abbeel and Jitendra Malik and Dhruv Batra and Yixin Lin and Oleksandr Maksymets and Aravind Rajeswaran and Franziska Meier},
year={2023},
eprint={2303.18240},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
The majority of Visual Cortex and CortexBench code is licensed under CC-BY-NC (see the LICENSE file for details), however portions of the project are available under separate license terms: trifinger_simulation is licensed under the BSD 3.0 license; mj_envs, mjrl are licensed under the Apache 2.0 license; Habitat Lab, dmc2gym, mujoco-py are licensed under the MIT license.
The trained policies models and the task datasets are considered data derived from the correspondent scene datasets.