camel-ai / crab

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
https://crab.camel-ai.org/
194 stars 28 forks source link
gui-automation language-model-agent large-language-models multi-agent-systems visual-language-models

🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

arXiv Slack Discord Wechat Twitter

Documentation | Website & Demos | Blog | Chinese Blog | CAMEL-AI

Overview

CRAB is a framework for building LLM agent benchmark environments in a Python-centric way.

Key Features

🌐 Cross-platform and Multi-environment

⚙ ️Easy-to-use Configuration

📐 Novel Benchmarking Suite

Installation

Prerequisites

pip install crab-framework[client]

Experiment on CRAB-Benchmark-v0

All datasets and experiment code are in crab-benchmark-v0 directory. Please carefully read the benchmark tutorial before using our benchmark.

Examples

Run template environment with openai agent

export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py

Demo Video

demo_video

Cite

Please cite our paper if you use anything related in your work:

@misc{xu2024crab,
      title={CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents}, 
      author={Tianqi Xu and Linyao Chen and Dai-Jie Wu and Yanjun Chen and Zecheng Zhang and Xiang Yao and Zhiqiang Xie and Yongchao Chen and Shilong Liu and Bochen Qian and Philip Torr and Bernard Ghanem and Guohao Li},
      year={2024},
      eprint={2407.01511},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.01511}, 
}