I' am interested in your module to generate HTML-Documents from PDF-Documents, especially in terms of table extraction with fonduer. Unfortunately the table-extraction/table-conversion from PDF to HTML didn't achieve good results for my Examples. Therefore I tried your ML-Approach for Table-Detection to train a ML-Model for my purpose within extract_tables.
When attempting to run extract_tables from the CLI. I got following error:
Traceback (most recent call last):
File "/home/julian/anaconda3/envs/layoutP/bin/extract_tables", line 13, in
from pdftotree.ml.TableExtractML import TableExtractorML
File "/home/julian/anaconda3/envs/layoutP/lib/python3.7/site-packages/pdftotree/ml/TableExtractML.py", line 21, in
from pdftotree.utils.pdf.pdf_utils import analyze_pages, normalize_pdf
ImportError: cannot import name 'analyze_pages' from 'pdftotree.utils.pdf.pdf_utils' (/home/julian/anaconda3/envs/layoutP/lib/python3.7/site-packages/pdftotree/utils/pdf/pdf_utils.py)
As the error says, there is a function 'analyze_pages' missing in the current repo. Is there an update coming soon which fixes this issue?
Thank you in advance!
Julian
Hello there,
I' am interested in your module to generate HTML-Documents from PDF-Documents, especially in terms of table extraction with fonduer. Unfortunately the table-extraction/table-conversion from PDF to HTML didn't achieve good results for my Examples. Therefore I tried your ML-Approach for Table-Detection to train a ML-Model for my purpose within extract_tables.
When attempting to run extract_tables from the CLI. I got following error:
Traceback (most recent call last): File "/home/julian/anaconda3/envs/layoutP/bin/extract_tables", line 13, in
from pdftotree.ml.TableExtractML import TableExtractorML
File "/home/julian/anaconda3/envs/layoutP/lib/python3.7/site-packages/pdftotree/ml/TableExtractML.py", line 21, in
from pdftotree.utils.pdf.pdf_utils import analyze_pages, normalize_pdf
ImportError: cannot import name 'analyze_pages' from 'pdftotree.utils.pdf.pdf_utils' (/home/julian/anaconda3/envs/layoutP/lib/python3.7/site-packages/pdftotree/utils/pdf/pdf_utils.py)
Test Scenario: Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic Python: 3.7
As the error says, there is a function 'analyze_pages' missing in the current repo. Is there an update coming soon which fixes this issue? Thank you in advance! Julian