Open paulpaul91 opened 1 year ago
For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:
Here's a HuggingFace Dataset version, with all associated metadata you'd want: https://huggingface.co/datasets/hheiden-roots/RICO-WidgetCaptioning
For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:
Thanks for the sharing, do you know how to get the bounding box information?
For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:
Thanks for the sharing, do you know how to get the bounding box information?
Is your question where you can get them to use them? If it's that, either download the HuggingFace dataset (with all extra metadata for each screen) or you can download the NJU file linked above (which is much smaller, but still has the bounding boxes).
If your question is how you can re-construct the bounding boxes yourself, that's done through downloading the original RICO view hierarchies from here, and then running code similar to what's mentioned in the main README to join the CSV annotations with the view hierarchy information.
For anyone looking for this dataset in a format with bounding boxes plainly exposed, the researchers at NJU preprocessed the dataset and shared it with their model, SeeClick:
Thanks for the sharing, do you know how to get the bounding box information?
Is your question where you can get them to use them? If it's that, either download the HuggingFace dataset (with all extra metadata for each screen) or you can download the NJU file linked above (which is much smaller, but still has the bounding boxes).
If your question is how you can re-construct the bounding boxes yourself, that's done through downloading the original RICO view hierarchies from here, and then running code similar to what's mentioned in the main README to join the CSV annotations with the view hierarchy information.
Thank you for the reply! I have found the code to re-generate the bounding box annotations. Below is the code to scale integer bbox numbers to 0~1. https://github.com/google-research/pix2struct/blob/1921ce107c93334c57c89b9bdb070741c4f93774/pix2struct/preprocessing/convert_widget_captioning.py#L70
Where is the widget-caption dataset?