Poor table auto-detection?

atlanhq / camelot

Camelot: PDF Table Extraction for Humans

Other

3.64k stars 354 forks source link

First of all, Camelot is a great project with enormous potential! In my experience, Camelot is properly extracting most of the tables from a document provided the right parameters have been supplied. However this is not always the case in the real world. Sometimes you have to deal with thousands of documents with different layouts and processing them one by one is not an option. It seems, auto-detection of tables in documents doesn't work very well at the moment. I tried to run a bulk table extraction from PDF documents with random layouts and the results were very poor. Probably, there is a need of a new robust bulk extraction method working for both framed and streamed tables which produces acceptable results. In other words, sometimes It may be worth trading accuracy for generalisation.

atlanhq / camelot

Poor table auto-detection? #304