The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
Hi, thank you for sharing the valuable datasets. I wonder how to calculate the F1 score @ IoU=0.1 since there are language dscriptions about the UI entity and they could not be exactly matched.
For the object detection tasks (like Screen Annotation), only the UI class and the bounding box information is used, the text description is discarded.
Hi, thank you for sharing the valuable datasets. I wonder how to calculate the F1 score @ IoU=0.1 since there are language dscriptions about the UI entity and they could not be exactly matched.