widget-caption dataset - Githubissues

paulpaul91 commented 1 year ago

Where is the widget-caption dataset？

hunterheiden commented 6 months ago

For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:

hunterheiden commented 6 months ago

Here's a HuggingFace Dataset version, with all associated metadata you'd want: https://huggingface.co/datasets/hheiden-roots/RICO-WidgetCaptioning

weiyi1991 commented 6 months ago

For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:

https://github.com/njucckevin/SeeClick/blob/main/readme_data.md

https://box.nju.edu.cn/f/4019422e045b480f8945/

Thanks for the sharing, do you know how to get the bounding box information?

hunterheiden commented 6 months ago

For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:

https://github.com/njucckevin/SeeClick/blob/main/readme_data.md

https://box.nju.edu.cn/f/4019422e045b480f8945/

Thanks for the sharing, do you know how to get the bounding box information?

Is your question where you can get them to use them? If it's that, either download the HuggingFace dataset (with all extra metadata for each screen) or you can download the NJU file linked above (which is much smaller, but still has the bounding boxes).

If your question is how you can re-construct the bounding boxes yourself, that's done through downloading the original RICO view hierarchies from here, and then running code similar to what's mentioned in the main README to join the CSV annotations with the view hierarchy information.

weiyi1991 commented 6 months ago

For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:

https://github.com/njucckevin/SeeClick/blob/main/readme_data.md

https://box.nju.edu.cn/f/4019422e045b480f8945/

Thanks for the sharing, do you know how to get the bounding box information?

Is your question where you can get them to use them? If it's that, either download the HuggingFace dataset (with all extra metadata for each screen) or you can download the NJU file linked above (which is much smaller, but still has the bounding boxes).

If your question is how you can re-construct the bounding boxes yourself, that's done through downloading the original RICO view hierarchies from here, and then running code similar to what's mentioned in the main README to join the CSV annotations with the view hierarchy information.

Thank you for the reply! I have found the code to re-generate the bounding box annotations. Below is the code to scale integer bbox numbers to 0~1. https://github.com/google-research/pix2struct/blob/1921ce107c93334c57c89b9bdb070741c4f93774/pix2struct/preprocessing/convert_widget_captioning.py#L70

google-research-datasets / widget-caption

widget-caption dataset #1