Describe the bug
A clear and concise description of what the bug is.
I'm getting the following stack trace error when running pdftotree on a PDF that contains scientific chemical information:
SEVERE: Cannot read JBIG2 image: jbig2-imageio is not installed
[DEBUG] pdftotree.TreeExtract - Tabula recognized 0 table(s).
Traceback (most recent call last):
File "/opt/anaconda3/envs/noble_app_env/bin/pdftotree", line 94, in <module>
args.visualize,
File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/core.py", line 66, in parse
pdf_html = extractor.get_html_tree()
File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/TreeExtract.py", line 319, in get_html_tree
page.appendChild(table_element)
File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/xml/dom/minidom.py", line 114, in appendChild
if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'
I've installed the latest Java version for Mac OS X. pdftotree seems to work just fine on simple PDFs. I've also haven't been able to figure out how to even attempt trying to install jbig2-imageio manually. I'm not familiar with how to install that JAR file into the pdftotree installation
To Reproduce
Steps to reproduce the behavior:
Install the Java JDK for Mac OSK
Install ImageMagick with brew
Attempt to run hOCR extraction with pdftotree on a file with chemical molecule images
Expected behavior
A clear and concise description of what you expected to happen.
For the proper hOCR output to be generated and for the command to execute successfully
Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.
Environment (please complete the following information):
OS: Mac OS X 10.15
pdftotree Version: [e.g. v0.5.0]
pdfminer.six Version: [e.g. 20201018]
Additional context
Add any other context about the problem here.
Describe the bug A clear and concise description of what the bug is.
I'm getting the following stack trace error when running pdftotree on a PDF that contains scientific chemical information:
I've installed the latest Java version for Mac OS X.
pdftotree
seems to work just fine on simple PDFs. I've also haven't been able to figure out how to even attempt trying to installjbig2-imageio
manually. I'm not familiar with how to install that JAR file into thepdftotree
installationTo Reproduce Steps to reproduce the behavior:
brew
pdftotree
on a file with chemical molecule imagesExpected behavior A clear and concise description of what you expected to happen.
For the proper hOCR output to be generated and for the command to execute successfully
Error Logs/Screenshots If applicable, add error logs or screenshots to help explain your problem.
Environment (please complete the following information):
pdftotree
Version: [e.g. v0.5.0]pdfminer.six
Version: [e.g. 20201018]Additional context Add any other context about the problem here.