HazyResearch / pdftotree

:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
MIT License
428 stars 90 forks source link

Missing function at pdf_utils.py (analyze_pages) #105

Closed busekuz closed 3 years ago

busekuz commented 3 years ago

Hello all!

I have been trying to run extract_tables on given paleo dataset. But I get an import error at TableExtractML.py when I run the command as,

Traceback (most recent call last): File "bin/extract_tables", line 13, in from pdftotree.ml.TableExtractML import TableExtractorML File "C:\Users\b\Anaconda3\envs\nlp\lib\site-packages\pdftotree\ml\TableExtractML.py", line 21, in from pdftotree.utils.pdf.pdf_utils import analyze_pages, normalize_pdf ImportError: cannot import name 'analyze_pages' from 'pdftotree.utils.pdf.pdf_utils' (C:\Users\b\Anaconda3\envs\nlp\lib\site-packages\pdftotree\utils\pdf\pdf_utils.py)

After that I have checked and seen that there is no analyze_pages function anywhere. Is it possible that there is a mistake at naming or something else?

Thanks!

HiromuHota commented 3 years ago

This issue was introduced by #79 and happens on v0.5.0.