Closed wmarshall484 closed 4 years ago
Do you have a simple python sample that demonstrates this?
If the following code is in my_module.py
import time
import sys
import numpy as np
class linspace_yielder(object):
def __init__(self, start=0, stop=1.0, count=10):
self.nums = np.linspace(start,stop,count)
def __call__(self):
for elem in self.nums:
yield elem
class Printer(object):
def __call__(self, arg):
print(str(arg), flush=True)
The following python script:
from streamsx.topology.topology import Topology
from my_module import linspace_yielder,Printer
from streamsx.topology import context
top = Topology("myTop")
lsy = linspace_yielder(0, 5, 100)
s = top.source(lsy)
p = Printer()
s.sink(p)
context.submit("STANDALONE", top.graph)
will throw the following error (stacktrace omitted):
ImportError: /usr/lib64/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so: undefined symbol: PyType_GenericNew
I was able to run a standalone sample that imported numpy in my environment. Just a guess, but maybe there's an issue with the numpy installation? I installed using:
sudo /usr/local/bin/pip3 install numpy
For reference, here are the steps I used to install the prerequisites and run the sample using the Streams QuickStart VM:
1) Install CPython 3.5.1
wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tar.xz
tar xf Python-3.5.1.tar.xz
cd Python-3.5.1
./configure --enable-shared
make
sudo make install
sudo ln -s /usr/local/lib/libpython3.5m.so.1.0 /usr/lib64/libpython3.5m.so.1.0
2) Install numpy
sudo /usr/local/bin/pip3 install numpy
3) Run numpy sample
a) numpy_test.py
from streamsx.topology.topology import Topology
from numpy_test_functions import linspace_yielder, Printer
from streamsx.topology import context
def main():
top = Topology("myTop")
#c = Counter()
#s = top.source(c)
s = top.source(linspace_yielder())
p = Printer()
s.sink(p)
context.submit("STANDALONE", top.graph)
if __name__ == '__main__':
main()
b) numpy_test_functions.py
import time
import sys
import numpy as np
class linspace_yielder(object):
def __init__(self, start=0, stop=1.0, count=10):
self.nums = [1,2,3,4]#np.linspace(start,stop,count)
def __call__(self):
for elem in self.nums:
yield elem
class Printer(object):
def __call__(self, arg):
print(str(arg), flush=True)
c) Run sample
> export PYTHONPATH=/home/streamsadmin/git/streamsx.topology/com.ibm.streamsx.topology/opt/python/packages
> python3 numpy_test.py
...
Output:
1
2
3
4
FYI - Stream has a print() method that does the flush().
@henrychi2, were you using the quickstart VM for you environment?
@henrychi2 Was pip3 installed when you installed CPython?
@wmarshall484 Yes, I used the VM to run the numpy sample.
@ddebrunner Yes, I believe pip3 is installed with CPython.
A side note - the numpy sample fails in DISTRIBUTED mode. numpy requires native libraries that live outside of site-packages, so they are not copied into the bundle. In general, I would say that for 3rd party libraries that have native components, they have to be explicitly installed on every Streams resource. For pure-Python 3rd party libraries, they should be copied into the .sab files.
vi pec.pe.10.stdouterr
File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/lib/polynomial.py", line 20, in <module>
from numpy.linalg import eigvals, lstsq, inv
File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/linalg/__init__.py", line 51, in <module>
from .linalg import *
File "/home/henrychi/.streams/var/Streams.sab_ALZ-hd-StreamsInstance/ac8388ab-13a3-44dd-a638-53ed42e583f3/4db67f340fb19e95cb4c8e9db25f095c079d846dfb2d9b6e5010c8e3/output/toolkits/tk6159299121638534261/opt/python/packages/numpy/linalg/linalg.py", line 29, in <module>
from numpy.linalg import lapack_lite, _umath_linalg
ImportError: liblapack.so.3: cannot open shared object file: No such file or directory
terminate called without an active exception
Hmmm, I did the same CPython install and pip3 is not there ...
I built python3.5 from source and I had to manually install pip3. Not sure where you got it from, I don't think CPython installs it.
@ddebrunner, are you seeing the error, or can you successfully run the sample like @henrychi2?
@wmarshall484 I was trying on a clean vm, but don't have pip3. Will try on an existing setup
It worked for me on my Streams 4.0.1 VM
What were the steps you took to install pip3? I downloaded an ez_install.py script which performed the installation.
pip3 should be installed when you build & install CPython3 but it requires some extra packages
I updated the install page to add the info to ensure pip3 gets installed.
And it works for me in standalone (with new Streams 4.1 VM which has only had Python 3.5 installed)
@henrychi2 Can you enter a new issue for the shared library and distributed issue. We need to see if it can be made to work for Bluemix.
I saw a report where someone else may have had a similar issue (to be clear, the one @wmarshall484 originally raised), that also went away when they started with a clean vm. So it seems there could be a problem caused by some other software that gets installed?
Update on the distributed issue: The bundle was created on the QuickStart VM which has the LAPACK and BLAS libraries pre-installed. When numpy is first installed, it will use the libraries for building if they exist.
1) Submitting the bundle on a clean VM without numpy but with LAPACK and BLAS
If I copy the bundle to a clean VM (without numpy installed), the application runs successfully when submitted using streamtool submitjob
.
2) Submitting the bundle on a Linux machine without numpy, LAPACK, or BLAS
The application fails to run using streamtool submitjob
since the LAPACK dependency is not there.
Update on this issue:
Performing a fresh install eliminated the import issues with numpy. I could import a module which used numpy into the generated toolkit, and the bundle would be created and executed appropriately. I'm not sure why this worked, as numpy still depends on the multiarray.cpython-35m-x86_64-linux-gnu.so
shared library.
Unfortunately, I saw a very similar error crop up when using a third party package called pybrain
which depends on scipy
. When trying to import components from scipy
, I received the following error:
Traceback (most recent call last):
File "/home/streamsadmin/scratch/my_module.py", line 2, in <module>
import pybrain
File "/usr/lib/python3.5/site-packages/pybrain/__init__.py", line 1, in <module>
from pybrain.structure.__init__ import *
File "/usr/lib/python3.5/site-packages/pybrain/structure/__init__.py", line 2, in <module>
from pybrain.structure.modules.__init__ import *
File "/usr/lib/python3.5/site-packages/pybrain/structure/modules/__init__.py", line 2, in <module>
from pybrain.structure.modules.gate import GateLayer, DoubleGateLayer, MultiplicationLayer, SwitchLayer
File "/usr/lib/python3.5/site-packages/pybrain/structure/modules/gate.py", line 10, in <module>
from pybrain.tools.functions import sigmoid, sigmoidPrime
File "/usr/lib/python3.5/site-packages/pybrain/tools/functions.py", line 4, in <module>
from scipy.linalg import inv, det, svd, logm, expm2
File "/usr/lib/python3.5/site-packages/scipy/linalg/__init__.py", line 174, in <module>
from .misc import *
File "/usr/lib/python3.5/site-packages/scipy/linalg/misc.py", line 5, in <module>
from .blas import get_blas_funcs
File "/usr/lib/python3.5/site-packages/scipy/linalg/blas.py", line 155, in <module>
from scipy.linalg import _fblas
ImportError: /usr/lib/python3.5/site-packages/scipy/linalg/_fblas.cpython-35m-x86_64-linux-gnu.so: undefined symbol: PyExc_ImportError
Similar to the stack trace I posted in the original comment on this issue, it seemed that third party shared libraries such as _fblas.cpython-35m-x86_64-linux-gnu.so
were not being linked with libpython3.5m.so
, hence the undefined symbol: PyExc_ImportError
. According to this post on the python mailing list, this is a known bug with the python extension API. A workaround is to manually open the shared library by calling
dlopen("libpython3.5m.so");
right before PyInitialize()
is called. This change is reflected here -- it has solved any 'undefined symbol' errors I've encountered. The way our operator models are set up, dlopen
will search /usr/local/lib
for the python3.5m shared library.
If you want to recreate the error yourself, the following should be sufficient: my_module.py
from scipy.linalg import _fblas
import numpy as np
import pybrain
class Counter(object):
def __init__(self, num):
self._range = range(num)
def __call__(self):
for num in self._range:
yield num
main.py
from streamsx.topology.topology import Topology
from streamsx.topology import context
from my_module import Counter
top = Topology("myTop")
c = Counter(10)
s = top.source(c)
s.print()
context.submit("STANDALONE", top.graph)
This is somewhat of a hackey fix since libpython3.5m.so
is explicitly passed to dlopen
. Right now we can get away with this because we also explicitly state in the operator model that we require python3.5, and that the .so files should be in /usr/local/lib. If the user is using a later version of python, however, then this call will fail. Ideally, there needs to be a better way to detect which version of python is being used, and then dlopen
the correct shared library appropriately.
While this issue does deal with shared library support for python, I believe it is unrelated issue #363
I tried using numpy in a Python API application, and I received this runtime error:
It seems that when Python is embedded in C++ and then compiled into a shared or static library (such as how
sc
creates PEs), it can't read symbols from other cpython dynamic libraries such as themultiarray.cpython-35m-x86_64-linux-gnu.so
included in numpy.According to this (https://groups.google.com/forum/#!topic/cython-users/hJr-kfKFVNc) and other threads, a solution is to pass
-Xlinker --export-dynamics
to g++ when the bundle is being compiled. I've tried adding them tosc
via the -w and -x arguments, but to no avail.Not being able to use numpy and cpython extensions is a big drawback for those who want to use python, so I feel that this is a high priority.