google-research-datasets / screen_annotation

The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
46 stars 7 forks source link

coordinates meaning #2

Open SivanDoveh opened 3 months ago

SivanDoveh commented 3 months ago

Hi, can you please explain what coordinates mean? I thought it was the top left and bottom right, but the numbers don't seem to be correct.

gbaechler commented 3 months ago

The order is left, right, top, bottom, and the coordinates are normalized and quantized between 0 and 999.