Closed jimhall closed 3 years ago
Ghostscript has been a pain, I'm actively trying to look into possible ways it can be removed. Thanks for reporting this issue with so much detail!
Would you like to start a PR to add the relevant note you mention? This file needs to be modified: https://github.com/camelot-dev/camelot/blob/master/docs/user/install-deps.rst
Would you like to start a PR to add the relevant note you mention? This file needs to be modified: https://github.com/camelot-dev/camelot/blob/master/docs/user/install-deps.rst
Sure - little weak on my PR-fu, so let me take a peak.
Thanks! The contributing docs should be able to help :)
Looking at the conda package for Ghostscript I determined it only delivered the userland binaries and not the fonts and libraries. I have opened an issue with conda packaging team and asked that the binaries and fonts be delivered.
@jimhall But wouldn't that mean that Camelot shouldn't work at all when installed from conda-forge (along with ghostscript on which it has a recipe dependency)? Could this error (while trying to use the pip-installed version of Camelot with ghostscript from conda) be the result of a linking issue?
Btw I didn't get any error while trying to reproduce this just now. These are the commands I ran:
conda create --name gs-env python=3.8
conda activate gs-env
conda install -c conda-forge ghostscript
which gs
: /home/vinayak/anaconda3/envs/gs-env/bin/gs
pip install camelot-py[cv]
I removed my system ghostscript (which removed a lot of other stuff) before running the above commands. I hope I didn't break my system :sweat_smile:
Looking at the conda package for Ghostscript I determined it only delivered the userland binaries and not the fonts and libraries. I have opened an issue with conda packaging team and asked that the binaries and fonts be delivered.
@jimhall But wouldn't that mean that Camelot shouldn't work at all when installed from conda-forge (along with ghostscript on which it has a recipe dependency)? Could this error (while trying to use the pip-installed version of Camelot with ghostscript from conda) be the result of a linking issue?
Yes I believe based on my experiments that a conda-forge installed Ghostscript will not work with Camelot.
The existing Camelot instructions state you can do a "conda install + homebrew Ghostscript" and Camelot will work. I agree with that combo - it works for me.
"conda install + conda forge gs" will fail.
Looking at the Anaconda website here, I click on the "files" tab and download the latest ghostscript archive here.
Unpack it and there is no libgs included.
Btw I didn't get any error while trying to reproduce this just now. These are the commands I ran:
conda create --name gs-env python=3.8
conda activate gs-env
conda install -c conda-forge ghostscript
which gs
:/home/vinayak/anaconda3/envs/gs-env/bin/gs
pip install camelot-py[cv]
I removed my system ghostscript (which removed a lot of other stuff) before running the above commands. I hope I didn't break my system 😅
Ha! Some questions:
what platform are you on? What exactly did you remove? libgs?
On Mac with Homebrew it is installed as /usr/local/lib/libgs.dylib
(It is actually a symlink to the actual lib).
According to Apple's Developer Docs here one of the default install locations is /usr/local/lib
, which is the default library install locations for Homebrew - which is why I believe the combination of Anaconda + Homebrew ghostscript works.
UNIX / Linux (which you look like you are using) will be libgs.so
. Things like LD_LIBRARY_PATH may have alternate directory settings for libraries that may have a libgs squirrel'd away somewhere.
When I do the conda-forge command (I just did your steps above (I did python 3.7, you do 3.8 so I tried your version choice)) I can confirm there is no libgs install off of $HOME/opt/ananconda3/envs.
Last little bit: looking at $HOME/opt/anaconda3/envs/camelot/lib/python3.7/site-packages/camelot/ext/ghostscript/_gsprint.py in the Traceback and using the REPL, I see the ghostscript code is doing the following:
>>> import sys
>>> from ctypes import *
>>> libgs = cdll.LoadLibrary("libgs.dylib")
>>> libgs
<CDLL 'libgs.dylib', handle 7fd2dc006dc0 at 0x7fd2dc2fd090>
I read the ctypes — A foreign function library for Python just now and I think I see the following test we could add to the doc and you could use to determine how your system is behaving:
>>> from ctypes.util import find_library
>>> find_library("gs")
'/usr/local/lib/libgs.dylib'
That may be why it is working for you - you may have multiple versions of ghostscript libgs.so hanging around. This code snip above should sort out what libgs you are using. Remove what you think is the correct copy and run again to find an alternate binary.
Shoot - I think I have "pull request the pull request" if you agree! :laughing:
JIM
Sorry I should've spent more time reproducing this! You're correct, installing ghostscript
from conda-forge
doesn't install libgs
! I was working under the assumption without something to back me up :( This test in the camelot conda-forge recipe, to just check if camelot
is importable is not enough. camelot
used to do a subprocess call to ghostscript
before, and this test should've been updated when we started using libgs
instead. I'll look into your PR today and merge it.
Ha! Some questions:
what platform are you on? What exactly did you remove? libgs?
I'm on Ubuntu 20.04 and I just removed the ghostscript
package using apt
.
UNIX / Linux (which you look like you are using) will be libgs.so. Things like LD_LIBRARY_PATH may have alternate directory settings for libraries that may have a libgs squirrel'd away somewhere.
I read the ctypes — A foreign function library for Python just now and I think I see the following test we could add to the doc and you could use to determine how your system is behaving:
That may be why it is working for you - you may have multiple versions of ghostscript libgs.so hanging around. This code snip above should sort out what libgs you are using. Remove what you think is the correct copy and run again to find an alternate binary.
Thank you for looking into ctypes
and suggesting to look for libgs
using the code snippet! I was able to find libgs
:
$ python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
>>> find_library("gs")
'libgs.so.9'
>>>
$ whereis libgs.so.9
libgs.so: /usr/lib/x86_64-linux-gnu/libgs.so.9
$ apt search libgs
libgs9/focal-updates,focal-security,now 9.50~dfsg-5ubuntu4.2 amd64 [installed]
interpreter for the PostScript language and for PDF - Library
I didn't remove it after I found it (to reproduce the bug) as that was going to take away evince
and a lot of other useful packages. :sweat_smile:
Looking at the Anaconda website here, I click on the "files" tab and download the latest ghostscript archive here.
Unpack it and there is no libgs included.
When I do the conda-forge command (I just did your steps above (I did python 3.7, you do 3.8 so I tried your version choice)) I can confirm there is no libgs install off of $HOME/opt/ananconda3/envs.
You're correct! I checked the shared library dependencies for gs
and there are a lot of them:
$ ldd /usr/bin/gs linux-vdso.so.1 (0x00007ffc065d0000) libgs.so.9 => /usr/lib/x86_64-linux-gnu/libgs.so.9 (0x00007efd6bada000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efd6b8e8000) libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007efd6b867000) libcups.so.2 => /usr/lib/x86_64-linux-gnu/libcups.so.2 (0x00007efd6b7cc000) libijs-0.35.so => /usr/lib/x86_64-linux-gnu/libijs-0.35.so (0x00007efd6b7c4000) libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007efd6b78c000) libjbig2dec.so.0 => /usr/lib/x86_64-linux-gnu/libjbig2dec.so.0 (0x00007efd6b76d000) libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007efd6b6e8000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007efd6b6cc000) liblcms2.so.2 => /usr/lib/x86_64-linux-gnu/liblcms2.so.2 (0x00007efd6b671000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efd6b522000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efd6b51c000) libidn.so.11 => /lib/x86_64-linux-gnu/libidn.so.11 (0x00007efd6b4e5000) libpaper.so.1 => /usr/lib/x86_64-linux-gnu/libpaper.so.1 (0x00007efd6b4df000) libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007efd6b498000) libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007efd6b3d9000) libopenjp2.so.7 => /usr/lib/x86_64-linux-gnu/libopenjp2.so.7 (0x00007efd6b383000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efd6b360000) /lib64/ld-linux-x86-64.so.2 (0x00007efd6ca7c000) libwebp.so.6 => /usr/lib/x86_64-linux-gnu/libwebp.so.6 (0x00007efd6b0f5000) libzstd.so.1 => /usr/lib/x86_64-linux-gnu/libzstd.so.1 (0x00007efd6b04c000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007efd6b023000) libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007efd6ae15000) libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007efd6adc8000) libavahi-common.so.3 => /usr/lib/x86_64-linux-gnu/libavahi-common.so.3 (0x00007efd6adba000) libavahi-client.so.3 => /usr/lib/x86_64-linux-gnu/libavahi-client.so.3 (0x00007efd6ada5000) libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007efd6abcf000) libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007efd6aba1000) libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007efd6ab98000) libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007efd6aabb000) libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007efd6aa88000) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007efd6aa81000) libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007efd6aa72000) libdbus-1.so.3 => /lib/x86_64-linux-gnu/libdbus-1.so.3 (0x00007efd6aa21000) libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007efd6a8eb000) libidn2.so.0 => /usr/lib/x86_64-linux-gnu/libidn2.so.0 (0x00007efd6a8ca000) libunistring.so.2 => /usr/lib/x86_64-linux-gnu/libunistring.so.2 (0x00007efd6a746000) libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007efd6a730000) libnettle.so.7 => /usr/lib/x86_64-linux-gnu/libnettle.so.7 (0x00007efd6a6f6000) libhogweed.so.5 => /usr/lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007efd6a6be000) libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007efd6a63a000) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007efd6a633000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007efd6a615000) libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007efd6a568000) libffi.so.7 => /usr/lib/x86_64-linux-gnu/libffi.so.7 (0x00007efd6a55c000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efd6a551000) liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007efd6a530000) libgcrypt.so.20 => /usr/lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007efd6a410000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007efd6a3ed000)
In contrast, gs
from conda-forge
has very little dependencies:
$ ldd /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
linux-vdso.so.1 (0x00007ffc8dba3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f33bd3e3000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f33bd3c0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f33bd1ce000)
/lib64/ld-linux-x86-64.so.2 (0x00007f33becb9000)
Their sizes are also in contrast, I'm guessing that gs
from conda-forge
is statically linked to libgs
in one executable:
$ du -sh /usr/bin/gs
16K /usr/bin/gs
$ du -sh /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
25M /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
To reproduce the bug in a clean environment, I created a docker container from the latest ubuntu
image:
$ docker run -it ubuntu /bin/bash
$ apt update && apt install curl git
$ curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh
$ bash Anaconda3-2019.03-Linux-x86_64.sh
$ eval "$(/root/anaconda3/bin/conda shell.bash hook)"
(base) $ conda create --name gs-env python=3.8
(base) $ conda activate gs-env
(gs-env) $ conda install -c conda-forge camelot-py
(gs-env) ./camelot/tests/files $ python3
Python 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
>>> find_library("gs")
>>>
(gs-env) ./camelot/tests/files $ which gs
/root/anaconda3/envs/gs-env/bin/gs
(gs-env) ./camelot/tests/files $ whereis libgs
libgs:
No libgs
found! I tried to run camelot
on the test pdf to reproduce your bug, but ran into a completely new one!
(gs-env) $ git clone https://github.com/camelot-dev/camelot
(gs-env) $ cd camelot/tests/files
(gs-env) ./camelot/tests/files $ python3
Python 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import camelot
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/__init__.py", line 6, in <module>
from .io import read_pdf
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/io.py", line 5, in <module>
from .handlers import PDFHandler
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/handlers.py", line 9, in <module>
from .parsers import Stream, Lattice
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/__init__.py", line 4, in <module>
from .lattice import Lattice
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 26, in <module>
from ..image_processing import (
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/image_processing.py", line 3, in <module>
import cv2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
>>>
Turns out even opencv
requires a shared library that is not present by default on this base ubuntu image
(gs-env) ./camelot/tests/files $ apt install libgl1-mesa-glx
After installing libgl1-mesa-glx
, I was able to reproduce your bug:
(gs-env) ./camelot/tests/files $ python3
Python 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
Traceback (most recent call last):
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py", line 260, in <module>
libgs = cdll.LoadLibrary("libgs.so")
File "/root/anaconda3/envs/gs-env/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/root/anaconda3/envs/gs-env/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgs.so: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/io.py", line 113, in read_pdf
tables = p.parse(
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/handlers.py", line 171, in parse
t = parser.extract_tables(
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 402, in extract_tables
self._generate_image()
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 211, in _generate_image
from ..ext.ghostscript import Ghostscript
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py", line 24, in <module>
from . import _gsprint as gs
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py", line 267, in <module>
raise RuntimeError("Please make sure that Ghostscript is installed")
RuntimeError: Please make sure that Ghostscript is installed
>>>
And after installing libgs9
:
(gs-env) ./camelot/tests/files $ apt install libgs9
It all worked fine:
(gs-env) ./camelot/tests/files $ python3
Python 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>>
Shoot - I think I have "pull request the pull request" if you agree! laughing
I didn't quite get this, what do you mean?
@jimhall Thank you for reporting this! I'm also trying to replace ghostscript as the default image conversion backend, but till that happens, I'll include the useful info to avoid this bug in the docs by merging your PR with some minor tweaks (I'll do that today!).
Would you be interested in testing if pdftopng
works nicely on macOS when you get time? It just requires a pip install and I'd be grateful if you can test that you're able to convert the pdf in that repo to a png, and report any suggestions/feedback.
I have Linux and macOS wheels for pdftopng
, and I'm trying to build Windows ones. I would appreciate any suggestions that you might have for some questions I mention at the end of this blog post.
@jimhall Thank you for reporting this issue and the PR! Please reply to the questions above when you get time :)
Hi @vinayak-mehta
Thank you for looking into
ctypes
and suggesting to look forlibgs
using the code snippet! I was able to findlibgs
:
Sure - this seems to be the key.
You're correct! I checked the shared library dependencies for
gs
and there are a lot of them:Their sizes are also in contrast, I'm guessing that
gs
fromconda-forge
is statically linked tolibgs
in one executable:$ du -sh /usr/bin/gs 16K /usr/bin/gs $ du -sh /home/vinayak/anaconda3/envs/gs-3.8/bin/gs 25M /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
This is a good find/conclusion: I did not consider static binaries. Pretty clear choice on the part of Anaconda!
Shoot - I think I have "pull request the pull request" if you agree! laughing
I didn't quite get this, what do you mean?
Ah - I was thinking my submission was too complicated after I sorted out the ctypes suggestion and was due for a re-write. But you took care of that with your modification of my PR. :grin:.
Would you be interested in testing if
pdftopng
works nicely on macOS when you get time? It just requires a pip install and I'd be grateful if you can test that you're able to convert the pdf in that repo to a png, and report any suggestions/feedback.
Sure I could test for Mac and Solaris.
I have Linux and macOS wheels for
pdftopng
, and I'm trying to build Windows ones. I would appreciate any suggestions that you might have for some questions I mention at the end of this blog post.
I should have some feedback by the end of this week!
Ah - I was thinking my submission was too complicated after I sorted out the ctypes suggestion and was due for a re-write. But you took care of that with your modification of my PR. :grin:
:+1:
Sure I could test for Mac and Solaris.
Thank you!
I should have some feedback by the end of this week!
Thank you! I was able to build Windows wheels, these are the two approaches I tried:
I'm thinking of going forward with the second one, but I'll first need to try installing that wheel on a fresh Windows instance and see if it works. Feedback welcome!
I had this problem today. Solved creating an env in conda installing camelot at the sime time: conda create -n camelot -c conda-forge camelot-py
The Camelot documentation highlights a dependency on Ghostscript and adds a check that confirms that the Ghostscript binary is installed. The key dependency for Camelot to run successfully is on a working copy of the libgs library (libgs.dylib for MacOS).
Specific ask: Would ask that a note be added to the documentation that in addition to running the
gs
binary for version info, add a note that states you require a full distribution ghostscript that includes the libraries and fonts.Details:
I performed the following steps:
Using the conda Ghostscript was a mistake and the Camelot documentation suggests using Homebrew toolchain. But I thought I would be good with using the conda Ghostscript but when ran a test script I got the following error:
Looking at the conda package for Ghostscript I determined it only delivered the userland binaries and not the fonts and libraries. I have opened an issue with conda packaging team and asked that the binaries and fonts be delivered.
Workaround: Install Ghostscript using the Homebrew tool chain
Owners of Camelot may argue (rightfully) this is a case of "pilot error / not following the docs". Just would suggest adding a note might prevent what looks like a common pilot error situation (See here and here. Homebrew dependency is pretty heavy weight lift also (you need Xcode for Homebrew to work, so a lot of stuff to download/configure to get going with Camelot).
Thanks for a great tool!
Environment