Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.49k stars 584 forks source link

feat/table element coordinates #3175

Open naunidh-tetrix opened 3 weeks ago

naunidh-tetrix commented 3 weeks ago

Hey team. I’m extracting coordinates for chunks that are generated by Unstriuctured but I realise that for a table, it gives coordinates for the full table. So for example, if I wanted to highlight the coordinates of a cell value in the table, the whole table gets highlighted. It seems that the table is a single element and has no awareness of the coordinates of its elements. Any way I can get the coordinates of the sub chunks in a table? By default, it would be good if the coordinates returned were mapped to specific sub chunks, not the parent table. Highlighting the whole table is kind of useless if someone wants to find true origin of a text block. Thanks