atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 357 forks source link

[Improvement] Mix Stream and Lattice ? #336

Closed CartierPierre closed 5 years ago

CartierPierre commented 5 years ago

Hi, it's me again 😃

I have to go to the next level of table extraction and I'm looking for a table extractor with Lattice AND Stream. Let me explain. I have 2 files, one is using white gapping and the other is using a line separator. File 1 :

This is the first column                 The second one is separated with a gaping

File 2 : This is the first column|The second one is too close to use gaping but there is a line between

Obviously, I don't want to select one method, I want a generic method. Any idea ?

CartierPierre commented 5 years ago

None of similar project (Tabula, PDFPlumber, etc) have a mixed solution, I'm sure it's possible to do something better

vinayak-mehta commented 5 years ago

Closing in favor of https://github.com/camelot-dev/camelot/issues/10.