This repository contains the code for the system for the paper: Empowering LLM to use Smartphone for Intelligent Task Automation.
For accessing the dataset DroidTask
, you could download it from Google Cloud, and you could refer to the About Dataset Section.
AutoDroid is implemented based on the DroidBot framework.
Make sure you have:
Python
Java
Android SDK
platform_tools
directory in Android SDK to PATH
Then clone this repo and install with pip
:
git clone git@github.com:MobileLLM/AutoDroid.git
cd AutoDroid/
pip install -e .
Prepare:
apk.zip
folder from Google Cloud, and unzip it, and prepare a device or an emulator connected to your host machine via adb
. .apk
file to your host machine, and prepare a device or an emulator connected to your host machine via adb
. tools.py
, replace the os.environ['GPT_URL']
with your own API key.Start:
droidbot -a <path/to/.apk> -o <output/of/app> -task <your task> -keep_env -keep_app
you can try the scripts in the ./scripts folder, and the tasks from the DroidTask are listed in the form.
Organization of the Dataset,
DroidTask
├── applauncher
│ ├── states
│ │ ├── Screenshot 1.png
│ │ ├── Screenshot 2.png
│ │ ├── ...
│ │ ├── View hierarchy 1.json
│ │ ├── View hierarchy 2.json
│ │ └── ...
│ ├── task1.yaml
│ ├── task2.yaml
│ ├── ...
│ └── utg.yaml
├── calendar
│ ├── states
│ │ ├── Screenshot 1.png
│ │ ├── Screenshot 2.png
│ │ ├── ...
│ │ ├── View hierarchy 1.json
│ │ ├── View hierarchy 2.json
│ │ └── ...
│ ├── task1.yaml
│ ├── task2.yaml
│ ├── ...
│ └── utg.yaml
DroidTask: The top level of the dataset, containing folders for each application included in the DroidTask, such as applauncher
and calendar
.
Application Folders: Records all the screenshots and raw view hierarchy parsed by droidbot:
States Folder: This folder holds all the captured states of the application during usage. A state includes both visual representations (screenshots) and structural data (view hierarchies).
Screenshots: Images captured from the application's interface, named sequentially (e.g., Screenshot 1.png
, Screenshot 2.png
, etc.).
View Hierarchies: JSON files detailing the structure of the application's UI for each captured state (e.g., View hierarchy 1.json
, View hierarchy 2.json
, etc.).
Task Files: YAML files named task1.yaml
, task2.yaml
, etc., containing the ground truth data for specific tasks within the application.
UTG File: A utg.yaml
file that records data from the user's random exploration of the application.
Mapping Between Tasks and States: If you want to use the screenshots in your method:
view hierarchy k.json
file to associate tasks with their corresponding application states.Welcome to contribute!
Enjoy!