Closed wking1986 closed 8 years ago
Hi, According to the logs, it failed to invoke Aurora.onScheduler(...)/ Can u add the flag "--verbose" when submitting the job and share the verbose output?
Get it !!
I modify env with "devel" and sucess submit to aurora
@maosongfu Thank you for your help
@maosongfu , I can see Topology in aurora , but not find in Heron-ui , Please Why?
BTW, I added a pull request: #884 , which logs the stderr of a spawned process even without "--verbose" flag.
OK, I try again , Thank you very much!!
@maosongfu ,I have modfied heron_tracker.yaml,and heron-ui can show topology But topology is not activate,then I execute cmd: heron activate --verbose aurora/root/devel ExclamationTopology
I find zk path :/heron/pplans reliably hava not TopologyName(ExclamationTopology) ,But other zk dir hava ExclamationTopology (eg: /heron/topologies/ExclamationTopology)
Why "/heron/pplans" hava no ExclamationTopology? which yaml config have problem?
statemgr.yaml like this:
Perhaps these may help #834 #822 More guides on troubleshooting will be published soon #877
@maosongfu @qiuyij I get same error when I use aurora. on local env, I can find detail error info from log-files directory. but I can't find it on aurora env. where can I find the log-files directory?
@maosongfu @qiuyij If Topology sumbit to aurora,Can I figure out reasons failed to start process from:~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout ?
@aaronshan If Topology sumbit to aurora,you can find out in mesos/slaves/........./latest/sandbox/heron-executor.stdout
@kartik894 I responded to your issue #888. Let's keep these two issues separate pls.
@wking1986
Could you check logs to see if your topology is actually running? Sometimes the pplan missing is due to topology not running correctly. If this is the case, you can kill the topology and submit it again and see if the issue resolves.
@aaronshan @wking1986 All scheduler implementations share similar working-directory (sandbox) structure. For aurora, can u go to the heron-executor.stdout && log-files folder in sandbox folder? (not in ~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout)?
@wking1986 thanks.
@maosongfu I find task run failed on mesos.
I get stderr log on sandbox:
log cotent:
I0609 11:19:34.714751 41904 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/frameworks\/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000\/executors\/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc\/runs\/44dc2a93-f7e6-4892-8bbc-668c3b845e17","user":"root"}
I0609 11:19:34.716109 41904 fetcher.cpp:369] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716125 41904 fetcher.cpp:243] Fetching directly into the sandbox directory
I0609 11:19:34.716142 41904 fetcher.cpp:180] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716159 41904 fetcher.cpp:160] Copying resource with command:cp '/usr/bin/thermos_executor' '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
I0609 11:19:34.754954 41904 fetcher.cpp:446] Fetched '/usr/bin/thermos_executor' to '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
I0609 11:19:35.444795 41901 exec.cpp:134] Version: 0.25.0
I0609 11:19:35.452504 41913 exec.cpp:208] Executor registered on slave 56dd9481-d4b1-4133-a258-51d5a538c46d-S0
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
ERROR] Regular plan unhealthy!
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug: Active thread: <_MainThread(MainThread, started 139986815493888)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-6, started daemon 139986237478656)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-7, started daemon 139986216498944)>
twitter.common.app debug: Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc] [TID=41953], started daemon 139986125973248)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-12, started daemon 139986226988800)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-8, started daemon 139986136463104)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-15, started daemon 139986094503680)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-2, started daemon 139986480895744)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-14, started daemon 139986081892096)>
twitter.common.app debug: Exiting cleanly.
How can I solve the problem "ERROR] Regular plan unhealthy!" thank u ~
@aaronshan Can u enter the sandbox folder, at the same level as stderr you opened, which has the same structure as working directory in LocalScheduler, and check the content in heron-executor.stdout?
@maosongfu enter sandbox folder: and then enter .logs folder: in fetch_heron_system folder, I can get info from stderr file:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 37.3M 0 16383 0 0 2674k 0 0:00:14 --:--:-- 0:00:14 2674k
100 37.3M 100 37.3M 0 0 826M 0 --:--:-- --:--:-- --:--:-- 956M
tar: ./release.yaml: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-executor: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-shell: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-stmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-tmaster: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-local-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-slurm-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing/heron-roundrobin-packing.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr/heron-metricsmgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-localfs-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-zookeeper-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance/heron-instance.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core: implausibly old time stamp 1970-01-01 08:00:00
tar: .: implausibly old time stamp 1970-01-01 08:00:00
this error I report at #845 and in fetch_user_package folder, I can get info from stderr file:
curl: (6) Couldn't resolve host 'hdfs:'
I think is problem maybe caused by heron.aurora file config error, my heron.aurora file like this:
"""
Launch the topology as a single aurora job with multiple instances.
The heron-executor is responsible for starting a tmaster (container 0)
and regular stmgr/metricsmgr/instances (container index > 0).
"""
heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-core.tar.gz"
topology_package_file = "topology.tar.gz"
# --- processes ---
#fetch_heron_system = Process(
# name = 'fetch_heron_system',
# cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
#)
fetch_heron_system = Process(
name = 'fetch_heron_system',
cmdline = 'hadoop fs -get hdfs:///tmp/heron/topologies/aurora/heron-core.tar.gz . && tar zxf %s' % ( core_release_file)
)
#fetch_user_package = Process(
# name = 'fetch_user_package',
# cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
#)
fetch_user_package = Process(
name = 'fetch_user_package',
cmdline = 'hadoop fs -get %s . && tar zxf %s' % (heron_topology_jar_uri, topology_package_file)
)
@nlu90 Do you know why "curl: (6) Couldn't resolve host 'hdfs:'"? According to the modified heron.aurora file, "curl" is commented and not even used.
@aaronshan Can u double check the actual command when running "fetch_user_package"?
On aurora page, you can click the name of process and get it.
@maosongfu @nlu90 @qiuyij Thanks for your help,Heron on Aurora is running!!
@wking1986 Awesome! Aslo, native mesos scheduler and yarn scheduler are coming soon too! Pull requests are being reviewed.
@maosongfu Great!! Very much looking forward to Heron on Mesos
@maosongfu I revise the heron.aurora file, and now it can working.I start two mesos slave, and I find that the one run task ok and the other one run task still fail.
and when I click hostname:
and launch_heron_executor's stdout file and stderr file are empty.
I run these command step by step:
hadoop fs -get hdfs:///tmp/heron/topologies/main/heron-core.tar.gz . && tar zxf heron-core.tar.gz
hadoop fs -get hdfs:///tmp/heron/topologies/main/ExclamationTopology-ruifeng.shan-tag-0--5954092425683288689 topology.tar.gz && tar zxf topology.tar.gz
./heron-core/bin/heron-executor 1 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ExclamationTopology.defn 1:word:2:0:exclaim1:1:0 l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron ./heron-core/bin/heron-tmaster ./heron-core/bin/heron-stmgr "./heron-core/lib/metricsmgr/*" "LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==" "heron-examples.jar" 31749 31148 31006 ./heron-conf/heron_internals.yaml exclaim1:536870912,word:536870912 "" jar heron-examples.jar /home/q/java8/jdk1.8.0_91 31985 ./heron-core/bin/heron-shell 31984 main ruifeng.shan devel "./heron-core/lib/instance/*" ./heron-conf/metrics_sinks.yaml "./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*" "31347"
and output is also empty. but heron-executor.stderr info is :
Traceback (most recent call last):
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 319, in execute
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 254, in _wrap_coverage
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 286, in _wrap_profiling
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 362, in _execute
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 420, in execute_entry
File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 425, in execute_module
File "/usr/local/lib/python2.7/runpy.py", line 180, in run_module
fname, loader, pkg_name)
File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 450, in <module>
File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 417, in main
File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 398, in launch
File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 362, in do_run_and_wait
File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 352, in run_process
File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
heron-executor.stdout content:
2016-06-09 15:29:41: Set up process group; executor becomes leader
2016-06-09 15:29:41: Register the SIGTERM signal handler
2016-06-09 15:29:41: Register the atexit clean up
2016-06-09 15:29:41: Logging pid 40559 to file heron-executor-1.pid
2016-06-09 15:29:41: Running process as mkdir -p log-files
2016-06-09 15:29:41: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
word 536870912 512 64 128
exclaim1 536870912 512 64 128
2016-06-09 15:29:41: Running heron-shell-1 process as ./heron-core/bin/heron-shell --port=31782 --log_file_prefix=log-files/heron-shell.log
2016-06-09 15:29:41: Logging pid 40569 to file heron-shell-1.pid
2016-06-09 15:29:41: Running container_1_word_2 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml
2016-06-09 15:29:41: Executor terminated; exiting all process in executor.
and the other machine's heron-executor.stdout content:
2016-06-09 17:36:29: Set up process group; executor becomes leader
2016-06-09 17:36:29: Register the SIGTERM signal handler
2016-06-09 17:36:29: Register the atexit clean up
2016-06-09 17:36:29: Logging pid 7100 to file heron-executor-0.pid
2016-06-09 17:36:29: Running process as mkdir -p log-files
2016-06-09 17:36:29: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
2016-06-09 17:36:29: Running heron-shell-0 process as ./heron-core/bin/heron-shell --port=31101 --log_file_prefix=log-files/heron-shell.log
2016-06-09 17:36:29: Logging pid 7110 to file heron-shell-0.pid
2016-06-09 17:36:29: Running metricsmgr-0 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx1024M -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags -Xloggc:log-files/gc.metricsmgr.log -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/metricsmgr/* com.twitter.heron.metricsmgr.MetricsManager metricsmgr-0 31132 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
2016-06-09 17:36:29: Logging pid 7111 to file metricsmgr-0.pid
2016-06-09 17:36:29: Running heron-tmaster process as ./heron-core/bin/heron-tmaster 31481 31107 31866 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron stmgr-1 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml 31132
2016-06-09 17:36:29: Logging pid 7112 to file heron-tmaster.pid
Hi,
I am getting the following error:
Error loading configuration: Could not find job aurora/root/default/ExclamationTopology
Candidates are:
aurora/root/devel/ExclamationTopology
@wking1986 Where should I exactly change the environment?
@kartik894
As I known, when u submit topology, you can set env(prod | devel | test | staging
$ heron help submit
usage: heron submit [options] cluster/[role]/[env] topology-file-name topology-class-name [topology-args]
Required arguments:
cluster/[role]/[env] Cluster, role, and environment to run topology
topology-file-name Topology jar/tar/zip file
topology-class-name Topology class name
Optional arguments:
--config-path (a string; path to cluster config; default: "/home/q/heron/heron-0.14.0/heron/conf")
--config-property (key=value; a config key and its value; default: [])
--deploy-deactivated (a boolean; default: "false")
--topology-main-jvm-property (property=value; JVM system property for executing topology main; default: [])
--verbose (a boolean; default: "false")
@aaronshan Hi,
According to the log, heron-executor failed to start a heron-instance process. Can u try to run the command directly: /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml
and check the output?
@maosongfu thank u very much~ Heron on Aurora is run ok!!
@aaronshan So what was the issue?
@maosongfu
the problem caused by no directory "/home/q/java8/jdk1.8.0_91". I forgot to configure it on the machine.😂😂😂.
@maosongfu when I sumbit a new topology
heron submit main/ruifeng.shan/devel /home/q/ruifeng.shan/heron-learn-1.0-SNAPSHOT-shaded.jar com.qunar.data.WordCountTopology WordCountTopology
and it still waiting:
[2016-06-10 01:50:54 +0000] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO: Launching topology in aurora
[2016-06-10 01:50:54 +0000] com.twitter.heron.spi.common.ShellUtils INFO: $> [aurora, job, create, --wait-until, RUNNING, --bind, TOPOLOGY_NAME=WordCountTopology, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, COMPONENT_RAMMAP=sentence-spout:1073741824,count-bolt:1073741824,report-bolt:1073741824,split-bolt:1073741824, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="", --bind, ROLE=ruifeng.shan, --bind, ENVIRON=devel, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, ISPRODUCTION=false, --bind, TOPOLOGY_CLASSPATH=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, CLUSTER=main, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, STATEMGR_CONNECTION_STRING=l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_ID=WordCountTopology1117b603-69c3-4096-b005-789fa81ea727, --bind, TOPOLOGY_PACKAGE_URI=hdfs:///tmp/heron/topologies/main/WordCountTopology-ruifeng.shan-tag-0--3163552258663319321, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, CORE_PACKAGE_URI=file:///home/q/heron/heron-0.14.0/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, RAM_PER_CONTAINER=5368709120, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, TOPOLOGY_DEFINITION_FILE=WordCountTopology.defn, --bind, INSTANCE_DISTRIBUTION=1:count-bolt:2:0:report-bolt:3:0:split-bolt:4:0:sentence-spout:1:0, --bind, NUM_CONTAINERS=2, --bind, CPUS_PER_CONTAINER=5.0, --bind, TOPOLOGY_JAR_FILE=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, DISK_PER_CONTAINER=17179869184, --bind, STATEMGR_ROOT_PATH=/heron, --bind, HERON_SANDBOX_JAVA_HOME=/home/q/java8/jdk1.8.0_91, main/ruifeng.shan/devel/WordCountTopology, /home/q/heron/heron-0.14.0/heron/conf/main/heron.aurora, --verbose]
[2016-06-10 01:51:04 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:14 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:24 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:34 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:44 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:54 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:04 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:14 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:24 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:34 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:44 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:54 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:04 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:14 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:24 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:34 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:44 +0000] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x15515dbd90f005e after 0ms
I find aurora show "PENDING : Insufficient: disk"
and mesos resources:
If I kill ExclamationTopology and re-submit ExclamationTopology, ExclamationTopology will work.
@aaronshan You can specifiy the disk_per_container in Config to override the default one: https://github.com/twitter/heron/blob/master/heron/api/src/java/com/twitter/heron/api/Config.java#L266
As aurora shows, it failed to schedule containers with requested disk. It is related to the Aurora Resource Management we can rarely do anything.
@maosongfu I had the same problem, but my Aurora PENDING didn't have any tips.
[2016-06-10 12:09:21 +0800] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO: Launching topology in aurora
[2016-06-10 12:09:21 +0800] com.twitter.heron.spi.common.ShellUtils INFO: $> [aurora, job, create, --wait-until, RUNNING, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, ENVIRON=devel, --bind, ROLE=root, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, TOPOLOGY_ID=ExclamationTopology24ef552e-69d1-48ae-ade2-cb9cc932f47e, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/main/ExclamationTopology-root-tag-0--7553500226791833473, --bind, STATEMGR_CONNECTION_STRING=192.168.1.108:2181, --bind, HERON_SANDBOX_JAVA_HOME=/usr/src/jdk1.7.0_79, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, DISK_PER_CONTAINER=1073741824, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, NUM_CONTAINERS=2, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, RAM_PER_CONTAINER=2147483648, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==", --bind, COMPONENT_RAMMAP=exclaim1:536870912,word:536870912, --bind, CORE_PACKAGE_URI=file:///usr/local/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ISPRODUCTION=false, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CLUSTER=main, --bind, CPUS_PER_CONTAINER=1.0, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, TOPOLOGY_JAR_FILE=heron-examples.jar, main/root/devel/ExclamationTopology, /usr/local/heron/conf/main/heron.aurora, --verbose]
[2016-06-10 12:09:31 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 1ms
[2016-06-10 12:09:41 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 0ms
[2016-06-10 12:09:51 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 1ms
[2016-06-10 12:10:01 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 1ms
[2016-06-10 12:10:11 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 0ms
[2016-06-10 12:10:21 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 0ms
[2016-06-10 12:10:31 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 1ms
[2016-06-10 12:10:41 +0800] org.apache.zookeeper.ClientCnxn FINE: Got ping response for sessionid: 0x1553877b4970008 after 0ms
@maosongfu thank u. I config the containers disk, cpu,ram with a smaller value, and now it run ok! and do you know that how can I increase my aurora resource config?
@jiandongjia @aaronshan You migh get more insights from Aurora Offical Website: http://aurora.apache.org/
I am using HDFS uploader for the aurora cluster. I am getting the following error upon submitting the topology:
Caused by: java.lang.IllegalArgumentException: Invalid path string "/hdfs:///heron/topologies/foo" caused by empty node name specified @7
These are my config files:
scheduler.yaml
# scheduler class for distributing the topology for execution
heron.class.scheduler: com.twitter.heron.scheduler.aurora.AuroraScheduler
# launcher class for submitting and launching the topology
heron.class.launcher: com.twitter.heron.scheduler.aurora.AuroraLauncher
# location of the core package
heron.package.core.uri: hdfs:///tmp/.heron/dist/heron-core.tar.gz
# location of java - pick it up from shell environment
heron.directory.sandbox.java.home: /usr/lib/jvm/java-8-oracle
# Invoke the IScheduler as a library directly
heron.scheduler.is.service: False
statemgr.yaml
# local state manager class for managing state in a persistent fashion
heron.class.state.manager: com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
# local state manager connection string
heron.statemgr.connection.string: "masternode:2181"
# path of the root address to store the state in a local file system
heron.statemgr.root.path: "hdfs:///heron"
# create the zookeeper nodes, if they do not exist
heron.statemgr.zookeeper.is.initialize.tree: True
# timeout in ms to wait before considering zookeeper session is dead
heron.statemgr.zookeeper.session.timeout.ms: 30000
# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.connection.timeout.ms: 30000
# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.retry.count: 10
# duration of time to wait until the next retry
heron.statemgr.zookeeper.retry.interval.ms: 10000
uploader.yaml
# uploader class for transferring the topology jar/tar files to storage
heron.class.uploader: com.twitter.heron.uploader.hdfs.HdfsUploader
# Directory of config files for hadoop client to read from
heron.uploader.hdfs.config.directory: /usr/local/hadoop/etc/hadoop
# name of the directory to upload topologies for HDFS uploader
heron.uploader.hdfs.topologies.directory.uri: hdfs:///heron/topologies/${CLUSTER}
client.yaml
# location of the core package
heron.package.core.uri: "hdfs:///tmp/.heron/dist/heron-core.tar.gz"
# Whether role/env is required to submit a topology. Default value is False.
heron.config.is.role.required: False
heron.config.is.env.required: False
Is there anything wrong in the config files?
@kartik894 It is caused by invalid config value in statemgr.yaml when trying to connect zookeeper: heron.statemgr.root.path: "hdfs:///heron" You can try with: /heron Or check zookeeper for path format.
@maosongfu Thanks! Its running now.
@maosongfu I have the same problem with error message in #883, but i get the following messages:
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO: DEBUG] Command=(['job', 'create', '--wait-until', 'RUNNING', '--bind', 'TOPOLOGY_NAME=ExclamationTopology', '--bind', 'SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml', '--bind', 'COMPONENT_RAMMAP=exclaim1:536870912,word:536870912', '--bind', 'SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml', '--bind', 'INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg=="', '--bind', 'ROLE=root', '--bind', 'ENVIRON=devel', '--bind', 'SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*', '--bind', 'SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*', '--bind', 'ISPRODUCTION=false', '--bind', 'TOPOLOGY_CLASSPATH=heron-examples.jar', '--bind', 'CLUSTER=aurora', '--bind', 'SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor', '--bind', 'STATEMGR_CONNECTION_STRING=192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181', '--bind', 'COMPONENT_JVM_OPTS_IN_BASE64=""', '--bind', 'TOPOLOGY_ID=ExclamationTopologyc2f53ad0-76be-4e83-8c63-2134faede687', '--bind', 'TOPOLOGY_PACKAGE_URI=file:///root/.herondata/repository/topologies/aurora/root/ExclamationTopology/ExclamationTopology-root-tag-0--3706733491519378097', '--bind', 'SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr', '--bind', 'CORE_PACKAGE_URI=file:///root/.heron/dist/heron-core.tar.gz', '--bind', 'SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*', '--bind', 'TOPOLOGY_PACKAGE_TYPE=jar', '--bind', 'RAM_PER_CONTAINER=2147483648', '--bind', 'SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster', '--bind', 'TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn', '--bind', 'INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0', '--bind', 'NUM_CONTAINERS=2', '--bind', 'CPUS_PER_CONTAINER=1.0', '--bind', 'TOPOLOGY_JAR_FILE=heron-examples.jar', '--bind', 'SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell', '--bind', 'DISK_PER_CONTAINER=1073741824', '--bind', 'STATEMGR_ROOT_PATH=/heron', '--bind', 'HERON_SANDBOX_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64', 'aurora/root/devel/ExclamationTopology', '/root/.heron/conf/aurora/heron.aurora', '--verbose'])
DEBUG] Config: ['"""\n', 'Launch the topology as a single aurora job with multiple instances.\n', 'The heron-executor is responsible for starting a tmaster (container 0)\n', 'and regular stmgr/metricsmgr/instances (container index > 0).\n', '"""\n', '\n', "heron_core_release_uri = '{{CORE_PACKAGE_URI}}'\n", "heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'\n", 'core_release_file = "heron-core.tar.gz"\n', 'topology_package_file = "topology.tar.gz"\n', '\n', '# --- processes ---\n', 'fetch_heron_system = Process(\n', " name = 'fetch_heron_system',\n", " cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)\n", ')\n', '\n', 'fetch_user_package = Process(\n', " name = 'fetch_user_package',\n", " cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)\n", ')\n', '\n', 'command_to_start_executor = \'{{SANDBOX_EXECUTOR_BINARY}} {{mesos.instance}} {{TOPOLOGY_NAME}} {{TOPOLOGY_ID}} {{TOPOLOGY_DEFINITION_FILE}} {{INSTANCE_DISTRIBUTION}} {{STATEMGR_CONNECTION_STRING}} {{STATEMGR_ROOT_PATH}} {{SANDBOX_TMASTER_BINARY}} {{SANDBOX_STMGR_BINARY}} "{{SANDBOX_METRICSMGR_CLASSPATH}}" {{INSTANCE_JVM_OPTS_IN_BASE64}} "{{TOPOLOGY_CLASSPATH}}" {{thermos.ports[port1]}} {{thermos.ports[port2]}} {{thermos.ports[port3]}} {{SANDBOX_SYSTEM_YAML}} {{COMPONENT_RAMMAP}} {{COMPONENT_JVM_OPTS_IN_BASE64}} {{TOPOLOGY_PACKAGE_TYPE}} {{TOPOLOGY_JAR_FILE}} {{HERON_SANDBOX_JAVA_HOME}} {{thermos.ports[http]}} {{SANDBOX_SHELL_BINARY}} {{thermos.ports[port4]}} {{CLUSTER}} {{ROLE}} {{ENVIRON}} "{{SANDBOX_INSTANCE_CLASSPATH}}" {{SANDBOX_METRICS_YAML}} "{{SANDBOX_SCHEDULER_CLASSPATH}}" "{{thermos.ports[scheduler]}}"\'\n', '\n', 'launch_heron_executor = Process(\n', " name = 'launch_heron_executor',\n", ' cmdline = command_to_start_executor,\n', ' max_failures = 1\n', ')\n', '\n', 'discover_profiler_port = Process(\n', " name = 'discover_profiler_port',\n", " cmdline = 'echo {{thermos.ports[yourkit]}} > yourkit.port'\n", ')\n', '\n', '# --- tasks ---\n', 'heron_task = SequentialTask(\n', " name = 'setup_and_run',\n", ' processes = [fetch_heron_system, fetch_user_package, launch_heron_executor, discover_profiler_port],\n', " resources = Resources(cpu = '{{CPUS_PER_CONTAINER}}', ram = '{{RAM_PER_CONTAINER}}', disk = '{{DISK_PER_CONTAINER}}')\n", ')\n', '\n', '# -- jobs ---\n', 'jobs = [\n', ' Job(\n', " name = '{{TOPOLOGY_NAME}}',\n", " cluster = '{{CLUSTER}}',\n", " role = '{{ROLE}}',\n", " environment = '{{ENVIRON}}',\n", ' service = True,\n', ' task = heron_task,\n', " instances = '{{NUM_CONTAINERS}}',\n", " announce = Announcer(primary_port = 'http')\n", ' )\n', ']\n']
Unknown cluster: aurora
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.utils.SchedulerUtils SEVERE: Failed to invoke IScheduler as library
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE: Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 19,2 replyHeader:: 19,4294967667,0 request:: '/heron/executionstate/ExclamationTopology,-1 response:: null
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST: Trace: DeleteBuilderImpl-Foreground - 9 ms
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO: Deleted node for path: /heron/executionstate/ExclamationTopology
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE: Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 20,2 replyHeader:: 20,4294967668,0 request:: '/heron/topologies/ExclamationTopology,-1 response:: null
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST: Trace: DeleteBuilderImpl-Foreground - 7 ms
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO: Deleted node for path: /heron/topologies/ExclamationTopology
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.LaunchRunner SEVERE: Failed to launch topology
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.SubmitterMain SEVERE: Failed to launch topology. Attempting to roll back upload.
[2016-07-02 16:27:05 +0430] com.twitter.heron.uploader.localfs.LocalFileSystemUploader INFO: Clean uploaded jar
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO: Closing the CuratorClient to: 192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181
[2016-07-02 16:27:05 +0430] org.apache.curator.framework.imps.CuratorFrameworkImpl FINE: Closing
[2016-07-02 16:27:05 +0430] org.apache.curator.CuratorZookeeperClient FINE: Closing
[2016-07-02 16:27:05 +0430] org.apache.curator.ConnectionState FINE: Closing
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper FINE: Closing session: 0x255aabf8eaa0028
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE: Closing client for session: 0x255aabf8eaa0028
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE: Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 21,-11 replyHeader:: 21,4294967669,0 request:: null response:: null
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE: Disconnecting client for session: 0x255aabf8eaa0028
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn INFO: EventThread shut down
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper INFO: Session: 0x255aabf8eaa0028 closed
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO: Closing the tunnel processes
Exception in thread "main" java.lang.RuntimeException: Failed to submit topology ExclamationTopology
at com.twitter.heron.scheduler.SubmitterMain.main(SubmitterMain.java:319)
ERROR: Failed to launch topology 'ExclamationTopology' because User main failed with status 1. Bailing out...
INFO: Elapsed time: 3.951s.
I changed the env role and ..., but issue didn't solved.
Check /etc/aurora/clusters.json file . Change name of cluster to 'aurora'
@kartik894 Thanks a lot! It resolved.
thanks! I deployed in ubuntu 16.04 and centos7 successful, but centos6 submit a job error occurred.
Hello, I am new to Heron. I am submitting a topology as root.
Initially, I did "heron submit aurora/ubuntu/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"
Then I got the error "Failed to initialize sandbox: Could not create sandbox because user does not exist: ubuntu"
So I modified and did this:
heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"
I am getting this error:
E0329 14:34:14.479283 32970 runner.py:299] Regular plan unhealthy!
Can someone help? Thanks a lot!
@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name
heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose
@maosongfu to confirm
Hello,
Thank you a lot. Yes, it was solved and I could run the job. Probably it was some topology running already and so mesos couldn't schedule another one!
Now, I can see the following on my terminal: "INFO: Topology 'ExclamationTopology' launched successfully"
My heron ui shows: "{"status": "success", "executiontime": 5.316734313964844e-05, "message": "", "version": "0.14.5", "result": {"aurora": {"root": {"devel": ["ExclamationTopology"]}}}}"
Just that, I cannot see the topology on heron tracker. My heron-tracker.yaml looks like this:
statemgrs: - type: "zookeeper" name: "localzk" hostport: "heron01:2181" rootpath: "/heron" tunnelhost: "localhost"
Can you kindly help? Thanks a lot again! Thanking you, Subarna Chatterjee Post-Doctoral ResearcherInria, Rennes Website: http://chatterjeesubarna.wix.com/subarna
On Wednesday, 29 March 2017, 19:33, bed debug <notifications@github.com> wrote:
@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose @maosongfu to confirm— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@chatterjeesubarna I've answered your question on the mailing list. Please don't double post questions on both the mailing list and git issues. Also, troubleshooting questions are best handled on the mailing list. Git issues should just be for bugs or feature requests.
Hello! I have a problem developing a Heron Cluster, when I submit the ExclamationTopoly......
b1@master_1:~$ heron submit aurora/b1/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2017-05-25 09:18:08 +0000] [INFO]: Using config file under /home/b1/.heron/conf/aurora
[2017-05-25 09:18:08 +0000] [INFO]: Launching topology: 'ExclamationTopology'
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: 192.168.57.163:2181
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/ExclamationTopology
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -test -e /heron/topologies/aurora''
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'. Overwriting it now
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpvMxs3m/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -copyFromLocal -f /tmp/tmpvMxs3m/topology.tar.gz /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/ExclamationTopology
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/ExclamationTopology
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: ExclamationTopology
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``aurora job create --wait-until RUNNING --bind STMGR_BINARY=./heron-core/bin/heron-stmgr --bind RAM_PER_CONTAINER=11811160064 --bind TOPOLOGY_PACKAGE_TYPE=jar --bind SHELL_BINARY=./heron-core/bin/heron-shell --bind TMASTER_BINARY=./heron-core/bin/heron-tmaster --bind STATEMGR_ROOT_PATH=/heron --bind TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz --bind JAVA_HOME=/usr/lib/jvm/java-8-oracle --bind CLUSTER=aurora --bind TOPOLOGY_BINARY_FILE=heron-examples.jar --bind SYSTEM_YAML=./heron-conf/heron_internals.yaml --bind EXECUTOR_BINARY=./heron-core/bin/heron-executor --bind CPUS_PER_CONTAINER=5.0 --bind IS_PRODUCTION=false --bind PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance --bind METRICS_YAML=./heron-conf/metrics_sinks.yaml --bind CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz --bind TOPOLOGY_CLASSPATH=heron-examples.jar --bind TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6 --bind ROLE=b1 --bind COMPONENT_JVM_OPTS_IN_BASE64="" --bind TOPOLOGY_NAME=ExclamationTopology --bind STATEMGR_CONNECTION_STRING=192.168.57.163:2181 --bind INSTANCE_CLASSPATH=./heron-core/lib/instance/* --bind DISK_PER_CONTAINER=5368709120 --bind COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472 --bind METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/* --bind ENVIRON=devel --bind SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/* --bind TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn --bind NUM_CONTAINERS=3 --bind INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==" aurora/b1/devel/ExclamationTopology /home/b1/.heron/conf/aurora/heron.aurora''
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):
Error loading configuration: Unknown cluster: aurora
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.aurora.AuroraCLIController: Failed to run process. Command=[aurora, job, create, --wait-until, RUNNING, --bind, STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, RAM_PER_CONTAINER=11811160064, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz, --bind, JAVA_HOME=/usr/lib/jvm/java-8-oracle, --bind, CLUSTER=aurora, --bind, TOPOLOGY_BINARY_FILE=heron-examples.jar, --bind, SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CPUS_PER_CONTAINER=5.0, --bind, IS_PRODUCTION=false, --bind, PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance, --bind, METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6, --bind, ROLE=b1, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, STATEMGR_CONNECTION_STRING=192.168.57.163:2181, --bind, INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, DISK_PER_CONTAINER=5368709120, --bind, COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472, --bind, METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ENVIRON=devel, --bind, SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, NUM_CONTAINERS=3, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==", aurora/b1/devel/ExclamationTopology, /home/b1/.heron/conf/aurora/heron.aurora], STDOUT=, STDERR=Error loading configuration: Unknown cluster: aurora
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.utils.LauncherUtils: Failed to invoke IScheduler as library
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/executionstate/ExclamationTopology
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/ExclamationTopology
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -rm /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):
Deleted /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: 192.168.57.163:2181
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [INFO]: Elapsed time: 15.872s.
Also in Aurora I only see the Example in thehttp://192.168.57.163:8081/scheduler....
Help me, Please!
@bjmota would you please ask troubleshooting questions on the mailing list? Github issues should be used for filing bugs and feature requests.
Hi guys: I build mesos-0.25 and aurora-0.12 , and they running normally.
When I "heron submit aurora --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology" , it has error about aurora