MobileLLM / AutoDroid

Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"
https://arxiv.org/abs/2308.15272
MIT License
232 stars 32 forks source link

Cannot input text for the apps #14

Closed alexsiu398 closed 5 months ago

alexsiu398 commented 5 months ago

I was hosting local llm ollama on linux with emulator android 11 api 30 or a physical android 11 device. It can successfully trigger apk and click some buttons. But when it tries to input text, there does not input any texts or show the keyboard. Any ideas?

INFO:DroidBot:Starting DroidBot INFO:Device:waiting for device 56479 [CONNECTION] ADB is enabled and connected. [CONNECTION] TelnetConsole is not enabled. [CONNECTION] DroidBotAppConn is enabled and connected. [CONNECTION] Minicap is not enabled. [CONNECTION] Logcat is enabled and connected. [CONNECTION] UserInputMonitor is enabled and connected. [CONNECTION] ProcessMonitor is enabled and connected. [CONNECTION] DroidBotIme is enabled and connected. Please wait while installing the app... INFO:Device:App installed: com.simplemobiletools.calendar INFO:Device:Main activity: com.simplemobiletools.calendar.activities.SplashActivity INFO:AppEnvManager:Start deploying environment, policy is none INFO:InputEventManager:start sending events, policy is task Action: KillAppEvent() INFO:TaskPolicy:Current state: d2aba9dcf2ce57a988070bcbcda6cc1931adff28e54aee1e491f45399d6e01e1 INFO:TaskPolicy:Trying to start the app... Action: IntentEvent(intent='am start com.simplemobiletools.calendar/com.simplemobiletools.calendar.activities.SplashActivity') INFO:TaskPolicy:Current state: 494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6 ** prompt: ** You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted. Task: create a event of tapping title to input laundry and save Previous UI actions:

Your answer should always use the following format: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "Yes/No", "Next step": "None or a ", "id": "an integer or -1 (if the task has been completed by previous UI actions)", "action": "tap or input", "input_text": "N/A or ..." }

Note that the id is the id number of the UI element to interact with. If you think the task has been completed by previous UI actions, the id should be -1. If 'Finished' is 'Yes', then the 'description' of 'Next step' is 'None', otherwise it is a high level description of the next step. If the 'action' is 'tap', the 'input_text' is N/A, otherwise it is the ''. Please do not output any content other than the JSON format. ** end of prompt ** INFO:openai._base_client:Retrying request to /chat/completions in 0.812839 seconds INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" response: { "Steps": "1. Tap on the 'New Event' button to create a new event 2. Tap on the 'March' button twice to select the month of March 3. Tap on the 'More options' button to access more options for the event, such as date and time, and add them if desired", "Analyses": "The current UI state does not allow for direct input of laundry information. However, it is possible to create a new event using the 'New Event' button and then edit the event's details, including adding laundry information. After creating the event, one can tap on the selected date or time to edit its details further, which may include entering laundry information.", "Finished": "No", "Next step": "1. Tap on the 'New Event' button", "id": 5, "action": "tap", "input_text": "-1" } Action: TouchEvent(state=494a7caaf2b0162a25f10630a11b3f4c569841b363eb9ce31ceb5793c1448dd6, view=158b14f3e368960dc2c5e14eb0cb8da5(MainActivity/ImageButton-)) INFO:TaskPolicy:Current state: b5a253bf5df48b6e43eab83047c21e3bcdae40b08949a983d412153ba080fbd5 ** prompt: ** You are a smartphone assistant to help users complete tasks by interacting with mobile apps.Given a task, the previous UI actions, and the content of current UI state, your job is to decide whether the task is already finished by the previous actions, and if not, decide which UI element in current UI state should be interacted. Task: create a event of tapping title to input laundry and save Previous UI actions:

Your answer should always use the following format: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "Yes/No", "Next step": "None or a ", "id": "an integer or -1 (if the task has been completed by previous UI actions)", "action": "tap or input", "input_text": "N/A or ..." }

Note that the id is the id number of the UI element to interact with. If you think the task has been completed by previous UI actions, the id should be -1. If 'Finished' is 'Yes', then the 'description' of 'Next step' is 'None', otherwise it is a high level description of the next step. If the 'action' is 'tap', the 'input_text' is N/A, otherwise it is the ''. Please do not output any content other than the JSON format. ** end of prompt ** INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK" response: { "Steps": "...", "Analyses": "...<Analyses of the relations between the task, and relations between the previous UI actions and current UI state>", "Finished": "No", "Next step": "Input title, location, description and set 'All-day OFF' checkbox to 'ON', then tap 'Save' button.", "id": "-1", "action": "tap", "input_text": "Title" } INFO:InputEventManager:Finish sending events [CONNECTION] ADB is disconnected [CONNECTION] UserInputMonitor is disconnected [CONNECTION] Logcat is disconnected WARNING:DroidBotIme:Failed to disconnect DroidBotIME! INFO:DroidBot:DroidBot Stopped [CONNECTION] ProcessMonitor is disconnected

alexsiu398 commented 5 months ago

It requires a set text event to input text