Camelot returns tables that contain no text (Where text should be detectable)

I'm trying to extract data from some ~900 certificates. These certificates have an identical visual structure, but are published by different parties. For the majority of files the extraction works. However, for several dozen files, the table-structure returned by Camelot contains only empty strings.

Plotting grid and text shows content is detected (e.g. Table 7 in DS_3663.pdf): DS_3663_table_7_grid

DS_3663_table_7_text

I'm using this command to read the pdf and create the tables: >>> tables=camelot.read_pdf('pdfs/DS_3663.pdf', pages='1-end', line_scale=110, shift_text=[''])

e.g. Table 7 contains this data: >>> tables[7].data [['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', '']]

Here are a few more example pdfs where the extraction fails in an identical manner: DS_885.pdf DS_2481.pdf DS_2083.pdf

Parsing all of these files with pdf2txt.py successfully extracts text, so I assume it should be possible to get a result with Camelot as well.

Environment

OS: Ubuntu 22.04.1 LTS
Python version: 3.10.6
Numpy version: 1.23.4
OpenCV version: 4.6.0.66
Ghostscript version: 9.55.0
Camelot version: 0.9.0

I've tried debugging this, but had difficulties understanding the intricate code in the bbox-sections. From what I've figured out, it appears to me that Camelot is unable to marry horizontal_text (Which contain the relevant text) with the line-grid.

camelot-dev / camelot

Camelot returns tables that contain no text (Where text should be detectable) #337