Mobile-Agent-v2 can't type even when ADB Keyboard is activated

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

https://arxiv.org/abs/2406.01014

MIT License

2.3k stars 193 forks source link

Mobile-Agent-v2 can't type even when ADB Keyboard is activated #27

Closed jingxuanchen916 closed 2 weeks ago

jingxuanchen916 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here.

I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action \"Type\" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code: https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284 https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.
Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it.

Many thanks :D

junyangwang0410 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here.

I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action "Type" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code:

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.

Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it.

Many thanks :D

Thank you very much for finding this bug. I think the bug came from your second change. In fact, I didn't notice that some phones had a high ADB keyboard. We would appreciate it if you could commit your code with a suitable height parameter for your equipment. We will merge your changes into the current branch.

Kunhao18 commented 2 weeks ago

I just figured out the flag mInputShown may be useful for checking the validity of the keyboard. Get it through adb shell dumpsys input_method | grep mInputShown.

jingxuanchen916 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here. I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action "Type" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code: https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.

Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it. Many thanks :D

Thank you very much for finding this bug. I think the bug came from your second change. In fact, I didn't notice that some phones had a high ADB keyboard. We would appreciate it if you could commit your code with a suitable height parameter for your equipment. We will merge your changes into the current branch.

Thanks for your kind reply - I am happy to commit my code for sure.

By the way, I feel the first change (i.e., Line 284) might also be essential, at least for me. If I don't move Lines 290-296 outside the if statement, the issue persists as shown in the screenshots. Additionally, based on the examples in your paper (e.g., Figure 6), it seems you may also be unable to check whether ADB Keyboard {ON} appears in the first iteration. Please feel free to correct me if I have misunderstood anything. Thanks again!

junyangwang0410 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here. I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action "Type" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code: https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.

Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it. Many thanks :D

Thank you very much for finding this bug. I think the bug came from your second change. In fact, I didn't notice that some phones had a high ADB keyboard. We would appreciate it if you could commit your code with a suitable height parameter for your equipment. We will merge your changes into the current branch.

Thanks for your kind reply - I am happy to commit my code for sure.

By the way, I feel the first change (i.e., Line 284) might also be essential, at least for me. If I don't move Lines 290-296 outside the if statement, the issue persists as shown in the screenshots. Additionally, based on the examples in your paper (e.g., Figure 6), it seems you may also be unable to check whether ADB Keyboard {ON} appears in the first iteration. Please feel free to correct me if I have misunderstood anything. Thanks again!

I think the problem remains the second one. What you said about observing the ADB keyboard only in the first iteration is not accurate. This is because Line 375 will still observe the screen after each operation. It seems to me that a suitable height parameter is enough to solve your problem, right?

junyangwang0410 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here. I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action "Type" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code: https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.

Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it. Many thanks :D

Thank you very much for finding this bug. I think the bug came from your second change. In fact, I didn't notice that some phones had a high ADB keyboard. We would appreciate it if you could commit your code with a suitable height parameter for your equipment. We will merge your changes into the current branch.

Thanks for your kind reply - I am happy to commit my code for sure.

By the way, I feel the first change (i.e., Line 284) might also be essential, at least for me. If I don't move Lines 290-296 outside the if statement, the issue persists as shown in the screenshots. Additionally, based on the examples in your paper (e.g., Figure 6), it seems you may also be unable to check whether ADB Keyboard {ON} appears in the first iteration. Please feel free to correct me if I have misunderstood anything. Thanks again!

Yeah, I get it. You need to modify the height parameter along with Line 375. Perhaps you can set this parameter to a global variable and use it at both Lines 284 and 375.

jingxuanchen916 commented 2 weeks ago

Hi, thanks for the kind open-sourcing. I found an issue when running my experiments, and I am wondering whether it is something wrong on my side or a potential corner case for the code, so I would like to discuss it here. I found that even when my ADB Keyboard is activated, the keyboard variable still shows as False, which affects the get_action_prompt() function. This causes the agent to perceive that the keyboard is not activated, preventing the agent from choosing the Type action. Below is an example of the issue:

Unable to Type. You cannot use the action "Type" because the keyboard has not been activated. If you want to type, please first activate the keyboard by tapping on the input box on the screen.

I then tried to debug and found the related code: https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L284

https://github.com/X-PLUG/MobileAgent/blob/35a2264f53aaa769b2c2b24fbb1805b837d45aa8/Mobile-Agent-v2/run.py#L290-L296

Based on this code, there are two reasons why the agent cannot type on my side:

Line 284: The keyword variable can only be switched to True in the first iteration. However, in my case (which might differ for different Android phones), my agent can only observe the ADB Keyboard {ON} when it can input something (e.g., already focused on the search box), which is almost impossible in the first iteration. Therefore, the keyboard variable is always False for the agent.

Line 292: The switch might be skipped if the condition is not satisfied. In my case (due to the phone I am using, Google Pixel 8 Pro), the location where ADB Keyboard {ON} appears is relatively too high to satisfy the condition. When I make the threshold smaller (e.g., 0.8), the issue is fixed.

Though this issue might be a rare case, I would greatly appreciate it if you could share some comments about it. Many thanks :D

Thank you very much for finding this bug. I think the bug came from your second change. In fact, I didn't notice that some phones had a high ADB keyboard. We would appreciate it if you could commit your code with a suitable height parameter for your equipment. We will merge your changes into the current branch.

Thanks for your kind reply - I am happy to commit my code for sure. By the way, I feel the first change (i.e., Line 284) might also be essential, at least for me. If I don't move Lines 290-296 outside the if statement, the issue persists as shown in the screenshots. Additionally, based on the examples in your paper (e.g., Figure 6), it seems you may also be unable to check whether ADB Keyboard {ON} appears in the first iteration. Please feel free to correct me if I have misunderstood anything. Thanks again!

Yeah, I get it. You need to modify the height parameter along with Line 375. Perhaps you can set this parameter to a global variable and use it at both Lines 284 and 375.

Yeah true it make senses now! Sorry for overlooking Line 375, thanks :D

jingxuanchen916 commented 2 weeks ago

I just figured out the flag mInputShown may be useful for checking the validity of the keyboard. Get it through adb shell dumpsys input_method | grep mInputShown.

Thanks for sharing this info!

For anyone interested, you can determine the input method using: adb shell dumpsys input_method | grep mCurMethodId :D