Execute the code : pdftotree.parse(\PATH\TO\AIMCO-2019.pdf, html_path=\PATH\TO\output.html,visualize=False)
check hOCR output
Expected behavior
each page of the output file to have their own texts and tables.
Error Logs/Screenshots
Environment (please complete the following information):
OS: Windows 10, version 20H2
pdftotree Version: v0.5.0
pdfminer.six Version: 20211012
Additional context
if that issue suppose to happen, would it be possible to have a variable to keep track of text and table already extracted? (i am not very experienced in programming).
Describe the bug the first page and the second page of the ouput contain the same text. page 4 and 5 are the same thing as well.
To Reproduce Steps to reproduce the behavior:
pdftotree.parse(\PATH\TO\AIMCO-2019.pdf, html_path=\PATH\TO\output.html,visualize=False)
Expected behavior each page of the output file to have their own texts and tables.
Error Logs/Screenshots
Environment (please complete the following information):
pdftotree
Version: v0.5.0pdfminer.six
Version: 20211012Additional context if that issue suppose to happen, would it be possible to have a variable to keep track of text and table already extracted? (i am not very experienced in programming).