Open HiromuHota opened 4 years ago
In general, simplifying dependencies sounds like a big win to me, esp if performance is comparable or better.
@lukehsiao thanks for your thoughts.
I just confirmed that Camelot allows to specify table areas (and pages). https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-areas
Is your feature request related to a problem? Please describe.
Switching from Tabula to Camelot have two advantages:
Describe the solution you'd like
I'd like to switch from Tabula to Camelot if it makes more sense. Currently, pdftotree detects table "area" (either ml, vision, or heuristic) and uses Tabula for table recognition. I'd have to figure out if Camelot takes
area
argument like Tabula does.Describe alternatives you've considered
It should be fine even if Camelot does not take
area
but detects tables well on its own.Additional context Add any other context or screenshots about the feature request here.
According to https://arxiv.org/pdf/1911.10683.pdf,