Open poetaster opened 9 months ago
@poetaster Did you manage to find a way to fix the issues with multi-row headers?
@poetaster Pinging this again, where you able to find a fix for the multi-row headers?
I hadn't worked on this (ended up reading excel files directly for that project) since then. I've looked now, but thought i should probably update camelot? What version would be best to test with?
I wasn't sure if I had done the original on my PC or on my jupyter lab server. On this pc, camelot is at 0.9.0 and the results are the same.
Ok, updated to 0.11.0 and same same. I'm not sure if it's just that I haven't understood the shifting 'foo', but even without, camelot get's the grid correct, but shifts the content of the 'controlability' columns 2 down.
Describe the bug This pdf, https://poetaster.de/misc/118.pdf (which I'm not uploading here since it may be a copyright issue) is read well but camelot shifts the rows under the multi-header controllability, down.
Steps to reproduce the bug
Load the above file and try both stream and lattice reading. I tried a lot of variations:
stream with different row tolerances:
dfs = camelot.read_pdf('118.pdf', flavor='stream', row_tol=20,flag_size=True)
and lattice with many scale and shift variations.
dfs = camelot.read_pdf('118.pdf', flavor='lattice', shift_text=['r','t', 'r', 't'], line_scale=20)
Lattice appears to get it right:
camelot.plot(dfs[0], kind='grid').show()
Which seems correct. But it always shifts the rows in the controllability part.
Expected behavior
Rows should not be shifted.
Code
Began with:
And tried many variation, most recent lattice being:
dfs = camelot.read_pdf('118.pdf', flavor='lattice', shift_text=['r','t', 'r', 't'], line_scale=20)
PDF See above.
Screenshots See above.
Environment
Additional context