X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
https://arxiv.org/abs/2406.01014
MIT License
2.3k stars 193 forks source link

Some risky code #11

Closed zhiyuan8 closed 4 months ago

zhiyuan8 commented 4 months ago

Hi team,

this is excellent work, but I find some risky codes which breaks code execution

  1. Here, the image is RGBA format, need to add https://github.com/X-PLUG/MobileAgent/blob/main/MobileAgent/crop.py#L83-L84

    if cropped_image.mode == 'RGBA':
        cropped_image = cropped_image.convert('RGB')
  2. res is not defined in exception https://github.com/X-PLUG/MobileAgent/blob/main/MobileAgent/api.py#L30-L31 change to

        try:
            res = requests.post(api_url, headers=headers, json=data)
            res = res.json()['choices'][0]['message']['content']
        except Exception as e:
            print(f"Network Error: {e}")

Am I able to contribute to this repo, too? I am a software engineer from Google : https://www.linkedin.com/in/zack-z-li

junyangwang0410 commented 4 months ago

Thank you for your contribution.

  1. We have already performed a convert operation on the stored screenshots in https://github.com/X-PLUG/MobileAgent/blob/61fa71ff83ae5d4c7653ba0fde54ee6cc1dc5cd7/MobileAgent/controller.py#L30.
  2. Although res is not defined in the exception, it is possible to print out the corresponding res to observe the error in the try block, whether it is a network error or a format error.
zhiyuan8 commented 4 months ago

Thanks, when I run you code, I find those code could return index out of range error since split is risky and GPT4 has hallucination

https://github.com/X-PLUG/MobileAgent/blob/main/run.py#L69-L71 https://github.com/X-PLUG/MobileAgent/blob/main/run.py#L85 ...

Maybe you could consider using regex? https://github.com/mnotgod96/AppAgent/blob/main/scripts/model.py#L101-L138

junyangwang0410 commented 4 months ago

Thanks, when I run you code, I find those code could return index out of range error since split is risky and GPT4 has hallucination

https://github.com/X-PLUG/MobileAgent/blob/main/run.py#L69-L71 https://github.com/X-PLUG/MobileAgent/blob/main/run.py#L85 ...

Maybe you could consider using regex? https://github.com/mnotgod96/AppAgent/blob/main/scripts/model.py#L101-L138

When the error "index out of range" occurs, it indicates that GPT-4V has not generated a reply in the required format. In such cases, attempting again with a try command often resolves the issue. We agree that using split is risky. We will refer to the code you provided for modifications. Thank you.