problems facing in gmft

conjuncts / gmft

Lightweight, performant, deep table extraction

MIT License

347 stars 23 forks source link

9 might be relevant
If you only need tabular data, then the usual workflow should work - refer to the quickstart notebooks.
If you need both tabular and nontabular data formatted together, then that is a longstanding enhancement, see #12.
I will take a look at it, but unfortunately complex merged cells aren't supported at this moment.
The tables will be provided as separate dataframes so you'll need to write a way to merge several of them. Since tables may vary a lot in terms of header contents I don't anticipate writing a default function, and a customized approach will be needed

Since the tables appear to have explicit (solid black) cell boundaries, camelot/img2table might be worth a shot.

conjuncts / gmft