Open mazhechao opened 10 years ago
I cannot reproduce, sorry. Can you past the whole traceback?
If the one in the Storm UI is incomplete (Storm UI tends to truncate long tracebacks like the one you posted), you can ssh into one of your Storm cluster nodes running the topology and inspect the worker-*.log
files in the logs/
folder.
That's the whole traceback, but why this error?
ImportError: No module named pyleus.storm
2014-10-18 19:36:18 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: Error when launching multilang subprocess
Traceback (most recent call last):
File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
File "/home/storm-0.9.2/workdir/supervisor/stormdist/exclamation_topology-21-1413631895/resources/exclamation_topology/test_word_spout.py", line 8, in <module>
from pyleus.storm import Spout
ImportError: No module named pyleus.storm
at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:64) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
at backtype.storm.spout.ShellSpout.open(ShellSpout.java:54) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
at backtype.storm.daemon.executor$fn__5573$fn__5588.invoke(executor.clj:520) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
at backtype.storm.util$async_loop$fn__457.invoke(util.clj:429) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:679) [na:1.6.0_24]
Caused by: java.io.EOFException: null
at org.msgpack.io.StreamInput.readByte(StreamInput.java:60) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.getHeadByte(MessagePackUnpacker.java:66) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.trySkipNil(MessagePackUnpacker.java:396) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:59) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:27) ~[stormjar.jar:na]
at org.msgpack.template.AbstractTemplate.read(AbstractTemplate.java:31) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:527) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:496) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.readMessage(MessagePackSerializer.java:198) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.connect(MessagePackSerializer.java:67) ~[stormjar.jar:na]
at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:62) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
... 5 common frames omitted
2014-10-18 19:36:18 b.s.util [INFO] Halting process: ("Worker died")
2014-10-18 19:36:18 b.s.util [INFO] Halting process: ("Worker died")
There mentions in README that
You NOT need to install pyleus on your Storm cluster
really? @poros
That's true. The page explaining this fact is still missing in the documentation (working on it). During the build phase, Pyleus installs itself under the hood in a virtualenv (together with all the other dependencies you specify) that will be included in the jar. Once submitted to the cluster, Pyleus component are run using the aforementioned virtualenv's Python interpreter, so that they can use what has been installed in that virtualenv.
I'll investigate the issue. Would you mind vim exclamation_topology.jar
and tell me if the pyleus
package has been installed? You can look for files like pyleus/storm/spout.py
.
I find something related
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/spout.py
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/bolt.py
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/__init__.py
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/component.pyc
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/spout.pyc
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/bolt.pyc
resources/pyleus_venv/lib/python2.6/site-packages/pyleus/storm/__init__.pyc
......
So I think the pyleus
package has been installed.
I guess it should be resources/pyleus_venv/lib64/python2.6/site-packages/pyleus/storm/spout.py
as I use a 64bit OS.
However it runs well in the local.
Would you please look at this issue? @patricklucas
I still haven't worked it out. Do you have any idea? @poros @patricklucas
I can't still identify the cause of this issue. It might be related to Pyleus' "unusual" use of virtualenv, maybe something related to site.py
, but I'm just guessing.
Let's try to collect more information. Has the machine you are using for building a topology the same operating system and architecture that the ones in your Storm cluster?
Yes. Both 64bit CentOS 6.3, Python 2.6.6, Java 1.6, Storm 0.9.2. But I haven't installed virtualenv on the machines in my Storm cluster.
I guess it is related to virtualenv too. I checked supervisor machines, and I can find the resources
and pyleus/storm
directory and all files there. But it can't locate that module when running.
However, everything is ok when I submit the topology in the local mode on the machine I'm using for building it.
Is the same version of Python installed on the machine where you built the topology as well as on the cluster to which you are submitting?
yes
I tried running the topology after uninstalling virtualenv from one of my Storm nodes, but I wasn't able to reproduce. I don't know if the missing installation may be related to the issue here.
I'm running out of options... What about weird $PYTHONPATH interferences? Mine is not set.
Mine is not set, too.
Can I send you the jar file I built out, and you help me to run it on your Storm cluster?
Ehm, I'd prefer not to run alien code on my machines for security reasons... In addition, since our configuration is different, I doubt that your build will actually run on my cluster. But I can definitely take a look at your jar, if you still want that, sure.
I had a look at the jar. The first difference that I was able to spot is that your jar has a file called virtualenv.py, which is missing in my jar. Maybe there is a difference in our virtualenv installation that impact pyleus behavior in a weird way. Did you installed it using pip or directly from source?
Using pip. $ sudo pip install virtualenv
I've same problem here, running Python 2.6.6, CentOS-6.4-x86_64, Java 1.7 and virtualenv 1.11.6 installed with pip.
I was able to get everything work by installing pyleus systemwide.
Okay, I think I known why this is happening. As far as I could understand to build a jar with Python requirements within, pyleus
creates a virtualenv
in a temporary dir and after all requirements are installed pyleus
will copy this temp dir into the final jar.
The thing is that when a virtualenv
is moved, it breaks all it's path references. Make sense? See this http://virtualenv.readthedocs.org/en/latest/virtualenv.html#making-environments-relocatable
You got it right, but that is not something we are not aware of. The point is that pyleus does not activate the virtualenv when it runs a topology on the cluster nodes, it simply invokes all Python modules using the Python interpreter reference in the virtualenv.
This seems to work even if the virtualenv is moved, but this thread means that there are some subtle issues with this approach that we are not still able to properly understand.
Making the virtualenv relocatable does not solve the issue, at least from the experiments we run during the early stages of development.
I had the same issue this evening and was able to solve it by installing pyleus on all my storm nodes.
Now I do not think that is really needed, but installing pyleus also installed virtualenv on my nodes and I think that is what solved it. Since the jar contains a reference to it, it should be on the nodes and it wasn't.
Anyhow, maybe this info helps you. maybe not. :)
I'm getting the following error with java version "1.7.0_65" and Python 2.7.6 Is it because of the Java and Python versions or am I doing something incorrectly? I tried installing pyleus on every node, but to no avail...
java.lang.RuntimeException: Error when launching multilang subprocess
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/mnt/storm/supervisor/stormdist/word_count-5-1418066388/resources/word_count/line_spout.py", line 4, in <module>
from pyleus.storm import Spout
File "/mnt/storm/supervisor/stormdist/word_count-5-1418066388/resources/pyleus_venv/lib/python2.7/site-packages/pyleus/__init__.py", line 3, in <module>
import pkg_resources
File "/mnt/storm/supervisor/stormdist/word_count-5-1418066388/resources/pyleus_venv/lib/python2.7/site-packages/pkg_resources.py", line 16, in <module>
import sys, os, time, re, imp, types, zipfile, zipimport
File "/usr/lib/python2.7/zipfile.py", line 6, in <module>
import io
File "/usr/lib/python2.7/io.py", line 51, in <module>
import _io
ImportError: No module named _io
at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:66) ~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.spout.ShellSpout.open(ShellSpout.java:74) ~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.daemon.executor$fn__3373$fn__3388.invoke(executor.clj:522) ~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.util$async_loop$fn__464.invoke(util.clj:461) ~[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: java.io.EOFException: null
at org.msgpack.io.StreamInput.readByte(StreamInput.java:60) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.getHeadByte(MessagePackUnpacker.java:66) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.trySkipNil(MessagePackUnpacker.java:396) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:59) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:27) ~[stormjar.jar:na]
at org.msgpack.template.AbstractTemplate.read(AbstractTemplate.java:31) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:527) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:496) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.readMessage(MessagePackSerializer.java:198) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.connect(MessagePackSerializer.java:67) ~[stormjar.jar:na]
at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:64) ~[storm-core-0.9.3.jar:0.9.3]
... 5 common frames omitted
I'm by no mans an expert, but this: ImportError: No module named _io
Would seem to indicate that your python environment isn't set up correctly. At a glance, this does not seem to be a pyleus problem. Can you check and see if other python scripts do work correctly?
Looking on google, I found a lot of references to this error for people using ubuntu. Are you? (http://askubuntu.com/questions/450979/importerror-no-module-named-io)
I get this on the python terminal
>>> import io
>>> import _io
>>> io
<module 'io' from '/usr/lib/python2.7/io.pyc'>
>>> _io
<module '_io' (built-in)>
I'm using ubuntu 14.04.1 server edition on vagrant
Then maybe your virtual environment is missing something. Poros posted earlier that pyleus does not activate the virtual environment, but I definetaly needed to have it installed for it to work. So maybe it is worth looking at... Did the link I posted help any?
I'm unable to determine where exactly I should execute those commands. I've also installed pyleus (and hence virtualenv) on each node, but still no good...
@poros, any tips?? I'm trying to run the word count topology in examples... If this looks like a version issue, could you suggest the environment you're running that I can bring up with vagrant?
@nilakshdas This issue is still kinda obscure to me. I wasn't able to reproduce even once, even trying a lot of different configurations, so I cannot suggest a version that should work for sure... (I run on Lucid, but I don't think it is related).
You can try to run those commands from the interpreter in the virtualenv that is shipped with your pyleus topology and check if the imports work. For your former topology, the path should have been:
/mnt/storm/supervisor/stormdist/word_count-5-1418066388/resources/pyleus_venv/bin/python
I'm sorry I am not able to provide a better help :(
I ran into this issue with a newly modified topology that had previously worked in local and nimbus mode.
Installing Pyleus on all of the Storm Supervisors appeared to have resolved the issue.
Seems similar to my experiences with Hadoop (Streaming) where all referenced libraries need to be installed on the executing compute/worker nodes.
One of the modifications I made with this topology was a new module import in one of the bolts. It required use of the 'system_site_packages: true' option in my Pyleus configuration in order to 'build' correctly.
Here's a copy of the start of the error stack if it helps with resolving the mystery.
2015-01-28 23:36:21 b.s.t.ShellBolt [ERROR] Halting process: ShellBolt died.
java.io.EOFException: null
at org.msgpack.io.StreamInput.readByte(StreamInput.java:60) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.getHeadByte(MessagePackUnpacker.java:66) ~[stormjar.jar:na]
at org.msgpack.unpacker.MessagePackUnpacker.trySkipNil(MessagePackUnpacker.java:396) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:59) ~[stormjar.jar:na]
at org.msgpack.template.MapTemplate.read(MapTemplate.java:27) ~[stormjar.jar:na]
at org.msgpack.template.AbstractTemplate.read(AbstractTemplate.java:31) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:527) ~[stormjar.jar:na]
at org.msgpack.MessagePack.read(MessagePack.java:496) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.readMessage(MessagePackSerializer.java:198) ~[stormjar.jar:na]
at com.yelp.pyleus.serializer.MessagePackSerializer.readShellMsg(MessagePackSerializer.java:74) ~[stormjar.jar:na]
at backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:99) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.task.ShellBolt$1.run(ShellBolt.java:116) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
The point is that this should be actually true: you don't really need to install Pyleus on your Storm cluster to make it work. But if you are using "system_site_packages: true" and your development machine environment does not match the Storm cluster environment, well, you are going to run into issues. Having pyleus installed in your development machine and using that option does actually imply that you need to have pyleus installed on your Storm cluster as well.
However, you got a point. I'll update the docs to be more explicit about that, since it easily leads to confusion. Thank you for bearing with us until we get the docs right.
There seem to be a few cases where trying to use 'pyleus build' with "system_site_packages: false" causes build errors, stemming from pyleus not being able to find a package to include in the new virtualenv, while the package exists in local system site packages.
Switching "system_site_packages" to 'true' allows the build to complete but resulted in Topology execution failures on a Storm cluster, unless Pyleus, itself, is also installed on the Storm cluster.
Update - Adding Notes: Ran into this issue in two environments: CentOS 6.6 and Ubuntu 12.04
CentOS Linux release 7.0.1406 (Core) apache-storm-0.9.2-incubating virtualenv 12.0.7 I only had pyleus installed in virtual env when I do pyleus local exclamation_topology.jar it gave me the error:
6105 [Thread-4] INFO backtype.storm.daemon.worker - Worker 20c62a54-b023-465b-be8c-0e524c5e8c00 for storm exclamation_topology-1-1424195163 on 6fa94e84-b431-438a-99bf-e011e282adc7:1024 has finished loading 6124 [Thread-9-exclaim1] ERROR backtype.storm.util - Async loop died! java.lang.RuntimeException: Error when launching multilang subprocess Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/tmp/6087e68f-0218-4ecd-b9f2-be195eacb15e/supervisor/stormdist/exclamation_topology-1-1424195163/resources/exclamation_topology/exclamation_bolt.py", line 5, in
from pyleus.storm import SimpleBolt ImportError: No module named pyleus.storm
This error goes away when I pip install pyleus globally.
I have pyleus installed in virtual env on my macbook pro, I do not have this problem.
I have hit this issue on CentOS 7. Seems that I need to install Pyleus on all of my storm supervisor nodes in order for it to operate.
I've run into this also. It seems that if you have a pyleus.conf file, and pass in anything at all for
#system_site_packages: False
then the jar creation logic does not include the pyleus package in the install. I've set it to "false" and "False" and "true" and "True" and get the same behaviour.
The only way to avoid the problem is to either (1) install pyleus on the supervisor nodes or (2) leave this line out of the config file all together.
pyleus -c pyleus.conf --verbose build pyleus_topology.yaml pyleus -c pyleus.conf --verbose submit switchdin-primary.jar
Note: pyleus installed everywhere by pip.
/src/platform/pyleusTrail# cat pyleus.conf
[storm]
storm_cmd_path: /usr/bin/storm
# optional: use -n option of pyleus CLI instead
nimbus_host: <Nimbus Host>
# optional: use -p option of pyleus CLI instead
nimbus_port: <nimbus port>
# java options to pass to Storm CLI
#jvm_opts: -Djava.io.tmpdir=/home/myuser/tmp
[build]
# PyPI server to use during the build of your topologies
#pypi_index_url: http://pypi.ninjacorp.com/simple/
# always use system-site-packages for pyleus virtualenvs (default: false)
# Note: This settings seems to break the download of pyleus regardless of what it is set to.
# Comment raised on project git page.
#system_site_packages: False
# list of packages to always include in your topologies
#include_packages: pyleus==0.3.0
I too was running into this issue. I tried leaving out the system_site_packages line and it did not help. Installed virtualenv and pyleus on all nodes and bingo, it worked.
I got a runtime exception when I submit exclamation_topology to my storm cluster both in spouts and bolts.
java.lang.RuntimeException: Error when launching multilang subprocess Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "main", fname,
ptyhon:2.6.6 Java:1.6 storm:0.9.2