Open shakirshakeelzargar opened 3 years ago
For those looking for a solution, I have found a workaround that works excellent. I have posted my solution here : https://stackoverflow.com/questions/64317363/camelot-switches-characters-around/64946264#64946264
@vinayak-mehta Any updates on this issue???
Maybe you can solve this issue by reducing the value of the LAParams(char_margin=default 2.0) parameter.
You can set the parameters yourself with
camelot.read_pdf(DOCUMENT, pages="all", layout_kwargs{"char_margin": 0.5})
for example. Maybe some other parameters have to be changed. But char_margin is here the first I have in mind.
I'm trying to parse tables in a PDF using Camelot. The cells have multiple lines of texts in them, and some have an empty line separating portions of the text:
I would expect this to be parsed as First line\nSecond line\n\nThird line (notice the double line breaks), but I get this instead: T\nFirst line\nSecond line\nhird line. The first character after a double-line-break moves to the beginning of the text, and I only get a single line-break instead.
I also tried using tabula, but that one messes up de entire table (data-frame actually) when there is an empty row in the table, and also in case of some words it puts a space between the characters.