DS4SD / docling-ibm-models

MIT License
15 stars 1 forks source link

TF response to HTML #29

Open mllife opened 2 days ago

mllife commented 2 days ago

Any helper code available in repo to do this?

I see some code (related to dataset conversion?)

Not sure about this - -- https://github.com/DS4SD/docling-ibm-models/blob/620ce428c66928e670d47004bbb563e1779070e4/docling_ibm_models/tableformer/data_management/tf_predictor.py#L1086

Any insight will be helpful.

maxmnemonic commented 2 days ago

Tableformer generates structure predictions in OTSL+ format (OTSL with header support), to convert OTSL structure represented as list of OTSL tags, to HTML structure (list of HTML tags) you can use this function: otsl_to_html

OTSL format described in our paper: Optimized Table Tokenization for Table Structure Recognition, there are big benefits in quality and performance to use it. It has a limited vocabulary: "ecel" - empty cell "fcel" - full cell "lcel" - left-looking span cell "ucel" - up-looking span cell "xcel" - cross cell (or 2d span cell) "nl" - new line More semantics and logic behind it we describe in a paper.

OTSL+ is extension of OTSL with extra tags or instructions that describe cells of: "ched" - column headers "rhed" - row headers "srow" - section rows

Model predicts these tags sequentially in tag decoder, simultaneously with bounding boxes from bbox decoder. then we can convert prediction to any other format, ie MD, HTML, etc.

By the way more high level usage of docling-ibm-models can be seen in docling itself: https://github.com/DS4SD/docling