axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.77k stars 309 forks source link

Table detection with Tesseract #28

Open royjohal opened 5 years ago

royjohal commented 5 years ago

Find a way to have table detection with Tesseract. Maybe Tesseract has some options to do it. Maybe we can find a way to pass the bounding boxes and content to Camelot.

Related links

khaledJabr commented 4 years ago

Any updates on this?

CarlosVilla00896 commented 4 years ago

hey, any update on this? is anyone working on this?

jvalls-axa commented 4 years ago

Hi @khaledJabr & @CarlosVilla00896 current development is focused on new OCR's integration like Google, Microsoft & Amazon.

After that we will evaluate more deeper how to detect tables when OCR is used to extract data.

khaledJabr commented 4 years ago

@jvalls-axa thanks for the response, and I understand. Out of curiosity, can you tell me what you mean by new OCR's integration?

jvalls-axa commented 4 years ago

@jvalls-axa thanks for the response, and I understand. Out of curiosity, can you tell me what you mean by new OCR's integration?

Next release will allow to run OCR's below:

And of course Tesseract that is current OCR solution used by Parsr.