Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution

Francesco Argenziano¹, Michele Brienza¹, Vincenzo Suriani², Daniele Nardi¹, Domenico D. Bloisi³
¹ Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy, ² School of Engineering, University of Basilicata, Potenza, Italy, ³ International University of Rome UNINT, Rome, Italy

[![arxiv paper](https://img.shields.io/badge/Project-Website-blue)](https://lab-rococo-sapienza.github.io/empower/) [![arxiv paper](https://img.shields.io/badge/arXiv-PDF-red)](https://lab-rococo-sapienza.github.io/empower/) [![license](https://img.shields.io/badge/License-Apache_2.0-yellow)](LICENSE)

Prerequisites

Our framework runs and has been tested on a TIAGo robot. The higher levels of the pipeline (up to the generation of the plan) should be executable in any ROS-based system with RGB-D cameras (camera topics name may be different). However, depending on the robot, low-level actions should be designed accordingly since different robot can have different capabilities (mobile vs fixed base, arms vs no-arm etc). Therefore, the GPT prompts in the src/agents.py file and the low-level grounders in src/low_level_execution.py and src/primitive_actions.py should be adapted to the desired robotic platform.

Model checkpoints

YOLO-World

To download the YOLO-World model, visit the official repository, download the .pth weights and put them in the config/yolow/ directory.

EfficientViT-SAM

To download the pre-trained weights and the encoder and decoder ONNX models follow the instruction on the official repository. After the download, put the models and weights in the config/efficientvitsam/ directory. For our tests, we used the l2 model.

Spacy

python -m spacy download en_core_web_sm

Environment

Create the conda environment from the configuration file

conda create -n empower python=3.8.18

then activate it with

conda activate empower

then install the dependencies

pip install -r requirements.txt
mim install mmcv==2.0.0 
mim install mmyolo mmdet

Also, set your OpenAI API key:

conda env config vars set OPENAI_API_KEY=<YOUR API KEY>

Usage

Clone this repo

git clone https://github.com/Lab-RoCoCo-Sapienza/empower.git

Build the package with catkin

catkin build

Before starting, set the following ROS parameters:

rosparam set /use_case <name>

to name the task you want to solve, and if your robotic platform has speakers, you may wish to enable the speech:

rosparam set /speech True

otherwise set it to False.

Run:

rosrun vlm_grasping create_pcl.py

to create the pointcloud and the image of the desired scene.

Modify the file src/detection.py to include in the task dictionary the task you want to perform. The key should be the name of the task like you set in the rosparamater, while the value is a description in natural language of this task.

In two different terminals run:

python3 models_cacher.py <name>

and

python3 execute_task.py

The models cacher loads the models in memory in such a way that is possible to run multiple time the execute_task script without the need to reload them, saving up execution time. Eventually, execute_task will produce the plan and will dump it in the corresponding folder.

Now, run:

rosrun vlm_grasping color_pcl.py

to obtain the grounded pointcloud with the segmentation masks projected on it.

Then, run:

rosrun vlm_grasping spawn_marker_centroids.py

to ground the detections in the RViz scene and to keep it populated.

Finally, in another terminal run:

rosrun vlm_grasping low_level_execution.py

to execute the actions needed to achieve the task.

Lab-RoCoCo-Sapienza / empower

readme