Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

This is the PyTorch code of our paper: Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs.

Authors: Wenke Xia*, Dong Wang*, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu‡, Xuelong Li‡

Accepted By: 2024 IEEE International Conference on Robotics and Automation(ICRA)

Resources:[Project Page],[Arxiv]

Introduction

PyTorch code accompanies our paper: Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

In this work, we delve into the problem of harnessing LLMs for generalizable articulated object manipulation, recognizing that the rich world knowledge inherent in LLMs is adept at providing a reasonable manipulation understanding of various articulated objects.

Framework

We aim to solve generalizable articulated object manipulation problems that require kinematic and geometric reasoning of objects to generate precise manipulation policy. As shown in (a), we first propose the Unified Kinematic Knowledge Parser component to grasp the object's kinematic structure as a unified kinematic knowledge representation for LLMs. As demonstrated in (b), based on the unified representation, we construct a kinematic-aware hierarchical prompt, which is used in the Kinematic-aware Manipulation Planner component to guide LLMs to generate an abstract textual manipulation sequence, and a sequence of 3D manipulation waypoints for generalizable articulated object manipulation.

Dataset

we conduct experiments in the Isaac Gym simulator, with distinct object instances across 16 types of articulated objects from the PartNet-Mobility dataset. The dataset can be downloaded here.

Install

In this work, we use Isaac gym as the simulation environment, the curobo as the motion planner. This code is tested in Ubuntu 20.04, pytorch 1.13.1+cu117, Isaac gym 2020.2.

First install the requirements:

pip install -r requirements.txt

Then install the Isaac gym and curobo according to their official documents.

Demonstration Collection

Collect the human demonstration by running

python human_manipulation --task open_door --index 0

, the keyboard could be used to determine the next waypoint following the rule below as defined in the subscribe_viewer_keyboard_event function:

    W, "move forward"
    S, "move backward"
    A, "move left"
    D, "move right"
    Q, "move up"
    E, "move down"
    G, "grasp"
    V, "release"
    I, "exec"
    R, "reset"
    H, "move to handle pos"
    N, "record data"
    Z, "rotate_right"
    X, "rotate_left"

We have provided visualization for the target waypoint. Once the target waypoint is determined, user could press N to record the data and move the franka arm to the target waypoint. When the task if finished, user could press L to save the trajectory.

To replay the human demonstration, users could use the command below.

python replay_human_manipulation.py --demo_path open_drawer_19179_0

Evaluation

To prompt GPT generate a reasonable trajectory, first change the openai key in prompt_tool/agent.py, then run

python gpt_manipulation.py --task open_drawer

We have provided part of the manipulation demonstrations in prompt_config and rotate_records. Users could also follow the format to prompt GPT with their demonstration dataset.

Results

The result on seen object categories are:

The result on unseen object categories are:

Acknowledgement

This research was supported by National Natural Science Foundation of China (NO.62106272), the Young Elite Scientists Sponsorship Program by CAST (2021QNRC001), the Shanghai AI Laboratory, National Key R\&D Program of China (2022ZD0160100), the National Natural Science Foundation of China (62376222), Young Elite Scientists Sponsorship Program by CAST (2023QNRC001) and Public Computing Cloud, Renmin University of China.

Citation


@misc{xia2023kinematicaware,
      title={Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs}, 
      author={Wenke Xia and Dong Wang and Xincheng Pang and Zhigang Wang and Bin Zhao and Di Hu},
      year={2023},
      eprint={2311.02847},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

GeWu-Lab / LLM_articulated_object_manipulation

readme