MaliParag / TFD-ICDAR2019

TDF-ICDAR 2019 Dataset for Typeset Math Formula Detection
http://crohme2019.cs.rit.edu/
67 stars 18 forks source link

pdf to icdar txt label #1

Closed ghost closed 4 years ago

ghost commented 5 years ago

@MaliParag hi, is there a way to convert pdf files to icdar txt labels, perhaps at line or character level?

MaliParag commented 5 years ago

@deepseek we provide OCR code for each character with their bounding box.

Each row in character gt file (for example) is organized as follows: Page number, left of bounding box of a character, top of the bounding box of a character, right of the bounding box of a character, bottom of the bounding box of a character, type of character (Text or Math), OCR code

For example: 0,1088,950,1131,1000,ORDINARY_TEXT,0141

You might be able to write a script to use this information to convert the provided ground truths to the icdar txt labels.