parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize('')
rsrcmgr = PDFResourceManager()
laparams = LAParams()
laparams.char_margin = 1.0
laparams.word_margin = 1.0
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
extracted_text = ''
for page in doc.get_pages():
interpreter.process_page(page)
layout = device.get_result()
for lt_obj in layout:
if isinstance(lt_obj, LTTextBox) or isinstance(lt_obj, LTTextLine):
extracted_text += lt_obj.get_text()
fp.close()
return extracted_text
And it keeps showing this
WARNING:pdfminer.layout: Too many boxes (102) to group, skipping.
This file
10200112008r.pdf
PS. I'm new in Python.
I think it is layout issue so I want to turn Auto layout analysis off by typing '-n'. But I didn't do it on the command line. I do it on Jupyter what should I do?
Hello, I use this code.
` def pdfread(fp):
And it keeps showing this
WARNING:pdfminer.layout: Too many boxes (102) to group, skipping.
This file 10200112008r.pdfPS. I'm new in Python.
I think it is layout issue so I want to turn Auto layout analysis off by typing '-n'. But I didn't do it on the command line. I do it on Jupyter what should I do?
Best regards Lyga