Closed ghost closed 4 years ago
@deepseek we provide OCR code for each character with their bounding box.
Each row in character gt file (for example) is organized as follows: Page number, left of bounding box of a character, top of the bounding box of a character, right of the bounding box of a character, bottom of the bounding box of a character, type of character (Text or Math), OCR code
For example: 0,1088,950,1131,1000,ORDINARY_TEXT,0141
You might be able to write a script to use this information to convert the provided ground truths to the icdar txt labels.
@MaliParag hi, is there a way to convert pdf files to icdar txt labels, perhaps at line or character level?