IBMStreams / streamsx.topology

Develop streaming applications for IBM Streams in Python, Java & Scala.
http://ibmstreams.github.io/streamsx.topology
Apache License 2.0
29 stars 43 forks source link

[Python] Error linking bundles with dynamic .so libraries. #355

Closed wmarshall484 closed 4 years ago

wmarshall484 commented 8 years ago

I tried using numpy in a Python API application, and I received this runtime error:

Traceback (most recent call last):
  File "/tmp/34554158535292600/output/toolkits/tk4834225719300514922/opt/python/modules/my_module.py", line 6, in <module>
    import numpy as np
  File "/usr/lib64/python3.5/site-packages/numpy/__init__.py", line 180, in <module>
    from . import add_newdocs
  File "/usr/lib64/python3.5/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/usr/lib64/python3.5/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/usr/lib64/python3.5/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/usr/lib64/python3.5/site-packages/numpy/core/__init__.py", line 14, in <module>
    from . import multiarray
ImportError: /usr/lib64/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so: undefined symbol: PyType_GenericNew
terminate called without an active exception

It seems that when Python is embedded in C++ and then compiled into a shared or static library (such as how sc creates PEs), it can't read symbols from other cpython dynamic libraries such as the multiarray.cpython-35m-x86_64-linux-gnu.so included in numpy.

According to this (https://groups.google.com/forum/#!topic/cython-users/hJr-kfKFVNc) and other threads, a solution is to pass -Xlinker --export-dynamics to g++ when the bundle is being compiled. I've tried adding them to sc via the -w and -x arguments, but to no avail.

Not being able to use numpy and cpython extensions is a big drawback for those who want to use python, so I feel that this is a high priority.

ddebrunner commented 8 years ago

Do you have a simple python sample that demonstrates this?

wmarshall484 commented 8 years ago

If the following code is in my_module.py

import time
import sys
import numpy as np

class linspace_yielder(object):
    def __init__(self, start=0, stop=1.0, count=10):
        self.nums = np.linspace(start,stop,count)

    def __call__(self):
        for elem in self.nums:
            yield elem 

class Printer(object):
    def __call__(self, arg):
        print(str(arg), flush=True)

The following python script:

from streamsx.topology.topology import Topology
from my_module import linspace_yielder,Printer
from streamsx.topology import context

top = Topology("myTop")
lsy = linspace_yielder(0, 5, 100)
s = top.source(lsy)
p = Printer()
s.sink(p)
context.submit("STANDALONE", top.graph)

will throw the following error (stacktrace omitted):

ImportError: /usr/lib64/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so: undefined symbol: PyType_GenericNew
henrychi2 commented 8 years ago

I was able to run a standalone sample that imported numpy in my environment. Just a guess, but maybe there's an issue with the numpy installation? I installed using: sudo /usr/local/bin/pip3 install numpy

For reference, here are the steps I used to install the prerequisites and run the sample using the Streams QuickStart VM:

1) Install CPython 3.5.1

wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tar.xz
tar xf Python-3.5.1.tar.xz
cd Python-3.5.1
./configure --enable-shared
make
sudo make install
sudo ln -s /usr/local/lib/libpython3.5m.so.1.0 /usr/lib64/libpython3.5m.so.1.0

2) Install numpy

sudo /usr/local/bin/pip3 install numpy

3) Run numpy sample

a) numpy_test.py

from streamsx.topology.topology import Topology
from numpy_test_functions import linspace_yielder, Printer
from streamsx.topology import context

def main():
    top = Topology("myTop")
    #c = Counter()
    #s = top.source(c)
    s = top.source(linspace_yielder())
    p = Printer()
    s.sink(p)
    context.submit("STANDALONE", top.graph)

if __name__ == '__main__':
    main()

b) numpy_test_functions.py

import time
import sys
import numpy as np

class linspace_yielder(object):
    def __init__(self, start=0, stop=1.0, count=10):
        self.nums = [1,2,3,4]#np.linspace(start,stop,count)

    def __call__(self):
        for elem in self.nums:
            yield elem 

class Printer(object):
    def __call__(self, arg):
        print(str(arg), flush=True)

c) Run sample

> export PYTHONPATH=/home/streamsadmin/git/streamsx.topology/com.ibm.streamsx.topology/opt/python/packages
> python3 numpy_test.py
...
Output:
1
2
3
4
ddebrunner commented 8 years ago

FYI - Stream has a print() method that does the flush().

wmarshall484 commented 8 years ago

@henrychi2, were you using the quickstart VM for you environment?

ddebrunner commented 8 years ago

@henrychi2 Was pip3 installed when you installed CPython?

henrychi2 commented 8 years ago

@wmarshall484 Yes, I used the VM to run the numpy sample.

@ddebrunner Yes, I believe pip3 is installed with CPython.

henrychi2 commented 8 years ago

A side note - the numpy sample fails in DISTRIBUTED mode. numpy requires native libraries that live outside of site-packages, so they are not copied into the bundle. In general, I would say that for 3rd party libraries that have native components, they have to be explicitly installed on every Streams resource. For pure-Python 3rd party libraries, they should be copied into the .sab files.

vi pec.pe.10.stdouterr
  File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/lib/polynomial.py", line 20, in <module>
    from numpy.linalg import eigvals, lstsq, inv
  File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/linalg/__init__.py", line 51, in <module>
    from .linalg import *
  File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/linalg/linalg.py", line 29, in <module>
    from numpy.linalg import lapack_lite, _umath_linalg
ImportError: liblapack.so.3: cannot open shared object file: No such file or directory
terminate called without an active exception
ddebrunner commented 8 years ago

Hmmm, I did the same CPython install and pip3 is not there ...

wmarshall484 commented 8 years ago

I built python3.5 from source and I had to manually install pip3. Not sure where you got it from, I don't think CPython installs it.

wmarshall484 commented 8 years ago

@ddebrunner, are you seeing the error, or can you successfully run the sample like @henrychi2?

ddebrunner commented 8 years ago

@wmarshall484 I was trying on a clean vm, but don't have pip3. Will try on an existing setup

ddebrunner commented 8 years ago

It worked for me on my Streams 4.0.1 VM

wmarshall484 commented 8 years ago

What were the steps you took to install pip3? I downloaded an ez_install.py script which performed the installation.

ddebrunner commented 8 years ago

pip3 should be installed when you build & install CPython3 but it requires some extra packages

ddebrunner commented 8 years ago

I updated the install page to add the info to ensure pip3 gets installed.

https://github.com/IBMStreams/streamsx.topology/wiki/Installing-Python-and-com.ibm.streamsx.topology-Python-alpha

ddebrunner commented 8 years ago

And it works for me in standalone (with new Streams 4.1 VM which has only had Python 3.5 installed)

ddebrunner commented 8 years ago

@henrychi2 Can you enter a new issue for the shared library and distributed issue. We need to see if it can be made to work for Bluemix.

ddebrunner commented 8 years ago

I saw a report where someone else may have had a similar issue (to be clear, the one @wmarshall484 originally raised), that also went away when they started with a clean vm. So it seems there could be a problem caused by some other software that gets installed?

henrychi2 commented 8 years ago

Update on the distributed issue: The bundle was created on the QuickStart VM which has the LAPACK and BLAS libraries pre-installed. When numpy is first installed, it will use the libraries for building if they exist.

1) Submitting the bundle on a clean VM without numpy but with LAPACK and BLAS If I copy the bundle to a clean VM (without numpy installed), the application runs successfully when submitted using streamtool submitjob.

2) Submitting the bundle on a Linux machine without numpy, LAPACK, or BLAS The application fails to run using streamtool submitjob since the LAPACK dependency is not there.

wmarshall484 commented 8 years ago

Update on this issue: Performing a fresh install eliminated the import issues with numpy. I could import a module which used numpy into the generated toolkit, and the bundle would be created and executed appropriately. I'm not sure why this worked, as numpy still depends on the multiarray.cpython-35m-x86_64-linux-gnu.so shared library.

Unfortunately, I saw a very similar error crop up when using a third party package called pybrain which depends on scipy. When trying to import components from scipy, I received the following error:

Traceback (most recent call last):
File "/home/streamsadmin/scratch/my_module.py", line 2, in <module>
import pybrain
File "/usr/lib/python3.5/site-packages/pybrain/__init__.py", line 1, in <module>
from pybrain.structure.__init__ import *
File "/usr/lib/python3.5/site-packages/pybrain/structure/__init__.py", line 2, in <module>
from pybrain.structure.modules.__init__ import *
File "/usr/lib/python3.5/site-packages/pybrain/structure/modules/__init__.py", line 2, in <module>
from pybrain.structure.modules.gate import GateLayer, DoubleGateLayer, MultiplicationLayer, SwitchLayer
File "/usr/lib/python3.5/site-packages/pybrain/structure/modules/gate.py", line 10, in <module>
from pybrain.tools.functions import sigmoid, sigmoidPrime
File "/usr/lib/python3.5/site-packages/pybrain/tools/functions.py", line 4, in <module>
from scipy.linalg import inv, det, svd, logm, expm2
File "/usr/lib/python3.5/site-packages/scipy/linalg/__init__.py", line 174, in <module>
from .misc import *
File "/usr/lib/python3.5/site-packages/scipy/linalg/misc.py", line 5, in <module>
from .blas import get_blas_funcs
File "/usr/lib/python3.5/site-packages/scipy/linalg/blas.py", line 155, in <module>
from scipy.linalg import _fblas
ImportError: /usr/lib/python3.5/site-packages/scipy/linalg/_fblas.cpython-35m-x86_64-linux-gnu.so: undefined symbol: PyExc_ImportError

Similar to the stack trace I posted in the original comment on this issue, it seemed that third party shared libraries such as _fblas.cpython-35m-x86_64-linux-gnu.so were not being linked with libpython3.5m.so, hence the undefined symbol: PyExc_ImportError. According to this post on the python mailing list, this is a known bug with the python extension API. A workaround is to manually open the shared library by calling

dlopen("libpython3.5m.so");

right before PyInitialize() is called. This change is reflected here -- it has solved any 'undefined symbol' errors I've encountered. The way our operator models are set up, dlopen will search /usr/local/lib for the python3.5m shared library.

If you want to recreate the error yourself, the following should be sufficient: my_module.py

from scipy.linalg import _fblas
import numpy as np
import pybrain

class Counter(object):
    def __init__(self, num):
        self._range = range(num)

    def __call__(self):
        for num in self._range:
            yield num

main.py

from streamsx.topology.topology import Topology
from streamsx.topology import context
from my_module import Counter

top = Topology("myTop")
c = Counter(10)
s = top.source(c)
s.print()
context.submit("STANDALONE", top.graph)

This is somewhat of a hackey fix since libpython3.5m.so is explicitly passed to dlopen. Right now we can get away with this because we also explicitly state in the operator model that we require python3.5, and that the .so files should be in /usr/local/lib. If the user is using a later version of python, however, then this call will fail. Ideally, there needs to be a better way to detect which version of python is being used, and then dlopen the correct shared library appropriately.

While this issue does deal with shared library support for python, I believe it is unrelated issue #363