1) does gmft contains any function set_cropbox similar to present in similar to present in pymupdf.
2) does gmft has functions which can read pdf and seprate non tabular data from tabular data like pymupdf does.
3) how can we get table context while we are detetcing table and converting it to csv .
4) how can i fix extraction problem in complex tables conversion of pdf to csv . below attached
5) how can we merge table extended to second page all together in one csv and if found new table then create another csv.
If you only need tabular data, then the usual workflow should work - refer to the quickstart notebooks.
If you need both tabular and nontabular data formatted together, then that is a longstanding enhancement, see #12.
I will take a look at it, but unfortunately complex merged cells aren't supported at this moment.
The tables will be provided as separate dataframes so you'll need to write a way to merge several of them. Since tables may vary a lot in terms of header contents I don't anticipate writing a default function, and a customized approach will be needed
Since the tables appear to have explicit (solid black) cell boundaries, camelot/img2table might be worth a shot.
1) does gmft contains any function set_cropbox similar to present in similar to present in pymupdf. 2) does gmft has functions which can read pdf and seprate non tabular data from tabular data like pymupdf does. 3) how can we get table context while we are detetcing table and converting it to csv . 4) how can i fix extraction problem in complex tables conversion of pdf to csv . below attached 5) how can we merge table extended to second page all together in one csv and if found new table then create another csv.