cooelf / Auto-GUI

Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
https://arxiv.org/abs/2309.11436
Apache License 2.0
174 stars 15 forks source link

Link for Dataset and trained models is not working. #1

Open Asaad-Pak opened 11 months ago

Asaad-Pak commented 11 months ago

Hello, I found this an interesting project but the link provided for accessing preprocessed data and for trained models is not working. The link you provided is: https://huggingface.co/cooelf/Auto-UI/tree/main Can you provide the right link so we can look into the dataset structure and format?

cooelf commented 11 months ago

Sorry that I forgot to make it public. It should work now :)

Asaad-Pak commented 11 months ago

Okay, the link has three files Auto-UI-base, Auto-UI-large, and blip. Are these files the weights of the trained model? Because I downloaded " Auto-UI-base " and it contains some JSON files. I want to look at the data it is trained the screenshots, and actions. Also, I want to know the structure of the data.

cooelf commented 11 months ago

Auto-UI-base, Auto-UI-large represent the base- and large- size models. Blip is the preprocessed data (the screenshots have been converted into vision features). If you want to see the original data, i.e., screenshots, please refer to https://github.com/google-research/google-research/blob/master/android_in_the_wild/demo.ipynb.

Yangyi-Chen commented 11 months ago

Hi Zhuosheng,

Nice work! I'd like to follow this work and for a fair comparison, could you please provide some information about the train/dev/test split since I need to locate the original data?

cooelf commented 10 months ago

Hi @Yangyi-Chen,

Thanks for your interest. You can refer to https://github.com/cooelf/Auto-UI/tree/main/dataset for the data split. The files contain the episode ID that you can use to fetch the original dataset.