apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.64k stars 598 forks source link

Heron submit aurora error #883

Closed wking1986 closed 8 years ago

wking1986 commented 8 years ago

Hi guys: I build mesos-0.25 and aurora-0.12 , and they running normally.
When I "heron submit aurora --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology" , it has error about aurora

image

 My config like this:

 scheduler.yaml is 

image

   statemgr.yaml is 

image

  I do not know why the error happen?  Help me ,Thanks a lot!!
maosongfu commented 8 years ago

Hi, According to the logs, it failed to invoke Aurora.onScheduler(...)/ Can u add the flag "--verbose" when submitting the job and share the verbose output?

wking1986 commented 8 years ago

Get it !!
image

I modify env with "devel" and sucess submit to aurora

image

@maosongfu Thank you for your help

wking1986 commented 8 years ago

@maosongfu , I can see Topology in aurora , but not find in Heron-ui , Please Why?

image

maosongfu commented 8 years ago
  1. Check whether the topology is running normally. You can do it via checking the log-files folder
  2. Heron trakcer feeds data for heron-ui. You need to start both of them with correct state manager configuration: https://github.com/twitter/heron/blob/master/heron/config/src/yaml/tracker/heron_tracker.yaml
maosongfu commented 8 years ago

BTW, I added a pull request: #884 , which logs the stderr of a spawned process even without "--verbose" flag.

wking1986 commented 8 years ago

OK, I try again , Thank you very much!!

wking1986 commented 8 years ago

@maosongfu ,I have modfied heron_tracker.yaml,and heron-ui can show topology But topology is not activate,then I execute cmd: heron activate --verbose aurora/root/devel ExclamationTopology

image

I find zk path :/heron/pplans reliably hava not TopologyName(ExclamationTopology) ,But other zk dir hava ExclamationTopology (eg: /heron/topologies/ExclamationTopology)

Why "/heron/pplans" hava no ExclamationTopology? which yaml config have problem?

statemgr.yaml like this: image

qiuyij commented 8 years ago

Perhaps these may help #834 #822 More guides on troubleshooting will be published soon #877

aaronshan commented 8 years ago

@maosongfu @qiuyij I get same error when I use aurora. on local env, I can find detail error info from log-files directory. but I can't find it on aurora env. where can I find the log-files directory?

wking1986 commented 8 years ago

@maosongfu @qiuyij If Topology sumbit to aurora,Can I figure out reasons failed to start process from:~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout ?

wking1986 commented 8 years ago

@aaronshan If Topology sumbit to aurora,you can find out in mesos/slaves/........./latest/sandbox/heron-executor.stdout

billonahill commented 8 years ago

@kartik894 I responded to your issue #888. Let's keep these two issues separate pls.

nlu90 commented 8 years ago

@wking1986

Could you check logs to see if your topology is actually running? Sometimes the pplan missing is due to topology not running correctly. If this is the case, you can kill the topology and submit it again and see if the issue resolves.

maosongfu commented 8 years ago

@aaronshan @wking1986 All scheduler implementations share similar working-directory (sandbox) structure. For aurora, can u go to the heron-executor.stdout && log-files folder in sandbox folder? (not in ~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout)?

  1. You can use Aurora page to navigate to the webpage showing sandbox content (http://aurora.apache.org/documentation/latest/getting-started/tutorial/, click "chroot browse")
  2. You can ssh to the target sandbox host and enter the sandbox folder.
aaronshan commented 8 years ago

@wking1986 thanks.

@maosongfu I find task run failed on mesos. image

I get stderr log on sandbox: image

log cotent:

I0609 11:19:34.714751 41904 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/frameworks\/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000\/executors\/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc\/runs\/44dc2a93-f7e6-4892-8bbc-668c3b845e17","user":"root"}
I0609 11:19:34.716109 41904 fetcher.cpp:369] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716125 41904 fetcher.cpp:243] Fetching directly into the sandbox directory
I0609 11:19:34.716142 41904 fetcher.cpp:180] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716159 41904 fetcher.cpp:160] Copying resource with command:cp '/usr/bin/thermos_executor' '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
I0609 11:19:34.754954 41904 fetcher.cpp:446] Fetched '/usr/bin/thermos_executor' to '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
I0609 11:19:35.444795 41901 exec.cpp:134] Version: 0.25.0
I0609 11:19:35.452504 41913 exec.cpp:208] Executor registered on slave 56dd9481-d4b1-4133-a258-51d5a538c46d-S0
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
ERROR] Regular plan unhealthy!
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug:   Active thread: <_MainThread(MainThread, started 139986815493888)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-6, started daemon 139986237478656)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-7, started daemon 139986216498944)>
twitter.common.app debug:   Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc] [TID=41953], started daemon 139986125973248)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-12, started daemon 139986226988800)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-8, started daemon 139986136463104)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-15, started daemon 139986094503680)>
twitter.common.app debug:   Active thread (daemon): <_DummyThread(Dummy-2, started daemon 139986480895744)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-14, started daemon 139986081892096)>
twitter.common.app debug: Exiting cleanly.

How can I solve the problem "ERROR] Regular plan unhealthy!" thank u ~

maosongfu commented 8 years ago

@aaronshan Can u enter the sandbox folder, at the same level as stderr you opened, which has the same structure as working directory in LocalScheduler, and check the content in heron-executor.stdout?

aaronshan commented 8 years ago

@maosongfu enter sandbox folder: image and then enter .logs folder: image in fetch_heron_system folder, I can get info from stderr file: image

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0 37.3M    0 16383    0     0  2674k      0  0:00:14 --:--:--  0:00:14 2674k
100 37.3M  100 37.3M    0     0   826M      0 --:--:-- --:--:-- --:--:--  956M
tar: ./release.yaml: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-executor: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-shell: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-stmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-tmaster: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-local-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-slurm-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing/heron-roundrobin-packing.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr/heron-metricsmgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-localfs-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-zookeeper-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance/heron-instance.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core: implausibly old time stamp 1970-01-01 08:00:00
tar: .: implausibly old time stamp 1970-01-01 08:00:00

this error I report at #845 and in fetch_user_package folder, I can get info from stderr file: image

curl: (6) Couldn't resolve host 'hdfs:'

I think is problem maybe caused by heron.aurora file config error, my heron.aurora file like this:

"""
Launch the topology as a single aurora job with multiple instances.
The heron-executor is responsible for starting a tmaster (container 0)
and regular stmgr/metricsmgr/instances (container index > 0).
"""

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-core.tar.gz"
topology_package_file = "topology.tar.gz"

# --- processes ---
#fetch_heron_system = Process(
#  name = 'fetch_heron_system',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
#)

fetch_heron_system = Process(
  name = 'fetch_heron_system',
  cmdline = 'hadoop fs -get  hdfs:///tmp/heron/topologies/aurora/heron-core.tar.gz  . && tar zxf %s' % ( core_release_file)
)

#fetch_user_package = Process(
#  name = 'fetch_user_package',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
#)

fetch_user_package = Process(
  name = 'fetch_user_package',
  cmdline = 'hadoop fs -get  %s  .  && tar zxf %s' % (heron_topology_jar_uri, topology_package_file)
)
maosongfu commented 8 years ago

@nlu90 Do you know why "curl: (6) Couldn't resolve host 'hdfs:'"? According to the modified heron.aurora file, "curl" is commented and not even used.

@aaronshan Can u double check the actual command when running "fetch_user_package"?

20160608214214

On aurora page, you can click the name of process and get it.

wking1986 commented 8 years ago

@maosongfu @nlu90 @qiuyij Thanks for your help,Heron on Aurora is running!!

maosongfu commented 8 years ago

@wking1986 Awesome! Aslo, native mesos scheduler and yarn scheduler are coming soon too! Pull requests are being reviewed.

wking1986 commented 8 years ago

@maosongfu Great!! Very much looking forward to Heron on Mesos

aaronshan commented 8 years ago

@maosongfu I revise the heron.aurora file, and now it can working.I start two mesos slave, and I find that the one run task ok and the other one run task still fail. image

and when I click hostname:

qq20160609-0 2x

and launch_heron_executor's stdout file and stderr file are empty.

I run these command step by step:

hadoop fs -get  hdfs:///tmp/heron/topologies/main/heron-core.tar.gz  . && tar zxf heron-core.tar.gz
hadoop fs -get hdfs:///tmp/heron/topologies/main/ExclamationTopology-ruifeng.shan-tag-0--5954092425683288689  topology.tar.gz && tar zxf topology.tar.gz
./heron-core/bin/heron-executor 1 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ExclamationTopology.defn 1:word:2:0:exclaim1:1:0 l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron ./heron-core/bin/heron-tmaster ./heron-core/bin/heron-stmgr "./heron-core/lib/metricsmgr/*" "LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" "heron-examples.jar" 31749 31148 31006 ./heron-conf/heron_internals.yaml exclaim1:536870912,word:536870912 "" jar heron-examples.jar /home/q/java8/jdk1.8.0_91 31985 ./heron-core/bin/heron-shell 31984 main ruifeng.shan devel "./heron-core/lib/instance/*" ./heron-conf/metrics_sinks.yaml "./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*" "31347"

and output is also empty. but heron-executor.stderr info is :

Traceback (most recent call last):
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 319, in execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 254, in _wrap_coverage
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 286, in _wrap_profiling
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 362, in _execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 420, in execute_entry
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 425, in execute_module
  File "/usr/local/lib/python2.7/runpy.py", line 180, in run_module
    fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 450, in <module>
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 417, in main
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 398, in launch
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 362, in do_run_and_wait
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 352, in run_process
  File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

heron-executor.stdout content:

2016-06-09 15:29:41: Set up process group; executor becomes leader
2016-06-09 15:29:41: Register the SIGTERM signal handler
2016-06-09 15:29:41: Register the atexit clean up
2016-06-09 15:29:41: Logging pid 40559 to file heron-executor-1.pid
2016-06-09 15:29:41: Running process as mkdir -p log-files
2016-06-09 15:29:41: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
word 536870912 512 64 128
exclaim1 536870912 512 64 128
2016-06-09 15:29:41: Running heron-shell-1 process as ./heron-core/bin/heron-shell --port=31782 --log_file_prefix=log-files/heron-shell.log
2016-06-09 15:29:41: Logging pid 40569 to file heron-shell-1.pid
2016-06-09 15:29:41: Running container_1_word_2 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml
2016-06-09 15:29:41: Executor terminated; exiting all process in executor.

and the other machine's heron-executor.stdout content:

2016-06-09 17:36:29: Set up process group; executor becomes leader
2016-06-09 17:36:29: Register the SIGTERM signal handler
2016-06-09 17:36:29: Register the atexit clean up
2016-06-09 17:36:29: Logging pid 7100 to file heron-executor-0.pid
2016-06-09 17:36:29: Running process as mkdir -p log-files
2016-06-09 17:36:29: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
2016-06-09 17:36:29: Running heron-shell-0 process as ./heron-core/bin/heron-shell --port=31101 --log_file_prefix=log-files/heron-shell.log
2016-06-09 17:36:29: Logging pid 7110 to file heron-shell-0.pid
2016-06-09 17:36:29: Running metricsmgr-0 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx1024M -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags -Xloggc:log-files/gc.metricsmgr.log -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/metricsmgr/* com.twitter.heron.metricsmgr.MetricsManager metricsmgr-0 31132 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
2016-06-09 17:36:29: Logging pid 7111 to file metricsmgr-0.pid
2016-06-09 17:36:29: Running heron-tmaster process as ./heron-core/bin/heron-tmaster 31481 31107 31866 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron stmgr-1 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml 31132
2016-06-09 17:36:29: Logging pid 7112 to file heron-tmaster.pid
kartik894 commented 8 years ago

Hi,

I am getting the following error:

Error loading configuration: Could not find job aurora/root/default/ExclamationTopology
Candidates are:
  aurora/root/devel/ExclamationTopology

@wking1986 Where should I exactly change the environment?

aaronshan commented 8 years ago

@kartik894 As I known, when u submit topology, you can set env(prod | devel | test | staging).

$ heron help submit
usage: heron submit [options] cluster/[role]/[env] topology-file-name topology-class-name [topology-args]

Required arguments:
  cluster/[role]/[env]  Cluster, role, and environment to run topology
  topology-file-name    Topology jar/tar/zip file
  topology-class-name   Topology class name

Optional arguments:
  --config-path (a string; path to cluster config; default: "/home/q/heron/heron-0.14.0/heron/conf")
  --config-property (key=value; a config key and its value; default: [])
  --deploy-deactivated (a boolean; default: "false")
  --topology-main-jvm-property (property=value; JVM system property for executing topology main; default: [])
  --verbose (a boolean; default: "false")
maosongfu commented 8 years ago

@aaronshan Hi,

According to the log, heron-executor failed to start a heron-instance process. Can u try to run the command directly: /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml

and check the output?

aaronshan commented 8 years ago

@maosongfu thank u very much~ Heron on Aurora is run ok!!

maosongfu commented 8 years ago

@aaronshan So what was the issue?

aaronshan commented 8 years ago

@maosongfu
the problem caused by no directory "/home/q/java8/jdk1.8.0_91". I forgot to configure it on the machine.😂😂😂.

aaronshan commented 8 years ago

@maosongfu when I sumbit a new topology

heron submit main/ruifeng.shan/devel /home/q/ruifeng.shan/heron-learn-1.0-SNAPSHOT-shaded.jar com.qunar.data.WordCountTopology WordCountTopology

and it still waiting:

[2016-06-10 01:50:54 +0000] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora
[2016-06-10 01:50:54 +0000] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, TOPOLOGY_NAME=WordCountTopology, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, COMPONENT_RAMMAP=sentence-spout:1073741824,count-bolt:1073741824,report-bolt:1073741824,split-bolt:1073741824, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="", --bind, ROLE=ruifeng.shan, --bind, ENVIRON=devel, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, ISPRODUCTION=false, --bind, TOPOLOGY_CLASSPATH=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, CLUSTER=main, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, STATEMGR_CONNECTION_STRING=l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_ID=WordCountTopology1117b603-69c3-4096-b005-789fa81ea727, --bind, TOPOLOGY_PACKAGE_URI=hdfs:///tmp/heron/topologies/main/WordCountTopology-ruifeng.shan-tag-0--3163552258663319321, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, CORE_PACKAGE_URI=file:///home/q/heron/heron-0.14.0/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, RAM_PER_CONTAINER=5368709120, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, TOPOLOGY_DEFINITION_FILE=WordCountTopology.defn, --bind, INSTANCE_DISTRIBUTION=1:count-bolt:2:0:report-bolt:3:0:split-bolt:4:0:sentence-spout:1:0, --bind, NUM_CONTAINERS=2, --bind, CPUS_PER_CONTAINER=5.0, --bind, TOPOLOGY_JAR_FILE=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, DISK_PER_CONTAINER=17179869184, --bind, STATEMGR_ROOT_PATH=/heron, --bind, HERON_SANDBOX_JAVA_HOME=/home/q/java8/jdk1.8.0_91, main/ruifeng.shan/devel/WordCountTopology, /home/q/heron/heron-0.14.0/heron/conf/main/heron.aurora, --verbose]
[2016-06-10 01:51:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms

I find aurora show "PENDING : Insufficient: disk" image

and mesos resources: image

If I kill ExclamationTopology and re-submit ExclamationTopology, ExclamationTopology will work.

maosongfu commented 8 years ago

@aaronshan You can specifiy the disk_per_container in Config to override the default one: https://github.com/twitter/heron/blob/master/heron/api/src/java/com/twitter/heron/api/Config.java#L266

As aurora shows, it failed to schedule containers with requested disk. It is related to the Aurora Resource Management we can rarely do anything.

jiandongjia commented 8 years ago

@maosongfu I had the same problem, but my Aurora PENDING didn't have any tips.

[2016-06-10 12:09:21 +0800] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora  
[2016-06-10 12:09:21 +0800] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, ENVIRON=devel, --bind, ROLE=root, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, TOPOLOGY_ID=ExclamationTopology24ef552e-69d1-48ae-ade2-cb9cc932f47e, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/main/ExclamationTopology-root-tag-0--7553500226791833473, --bind, STATEMGR_CONNECTION_STRING=192.168.1.108:2181, --bind, HERON_SANDBOX_JAVA_HOME=/usr/src/jdk1.7.0_79, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, DISK_PER_CONTAINER=1073741824, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, NUM_CONTAINERS=2, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, RAM_PER_CONTAINER=2147483648, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", --bind, COMPONENT_RAMMAP=exclaim1:536870912,word:536870912, --bind, CORE_PACKAGE_URI=file:///usr/local/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ISPRODUCTION=false, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CLUSTER=main, --bind, CPUS_PER_CONTAINER=1.0, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, TOPOLOGY_JAR_FILE=heron-examples.jar, main/root/devel/ExclamationTopology, /usr/local/heron/conf/main/heron.aurora, --verbose]  
[2016-06-10 12:09:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:09:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:09:51 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:01 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:11 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:21 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
aaronshan commented 8 years ago

@maosongfu thank u. I config the containers disk, cpu,ram with a smaller value, and now it run ok! and do you know that how can I increase my aurora resource config?

maosongfu commented 8 years ago

@jiandongjia @aaronshan You migh get more insights from Aurora Offical Website: http://aurora.apache.org/

kartik894 commented 8 years ago

I am using HDFS uploader for the aurora cluster. I am getting the following error upon submitting the topology:

Caused by: java.lang.IllegalArgumentException: Invalid path string "/hdfs:///heron/topologies/foo" caused by empty node name specified @7

These are my config files:

scheduler.yaml

# scheduler class for distributing the topology for execution
heron.class.scheduler: com.twitter.heron.scheduler.aurora.AuroraScheduler

# launcher class for submitting and launching the topology
heron.class.launcher: com.twitter.heron.scheduler.aurora.AuroraLauncher

# location of the core package
heron.package.core.uri: hdfs:///tmp/.heron/dist/heron-core.tar.gz

# location of java - pick it up from shell environment
heron.directory.sandbox.java.home: /usr/lib/jvm/java-8-oracle

# Invoke the IScheduler as a library directly
heron.scheduler.is.service: False

statemgr.yaml

# local state manager class for managing state in a persistent fashion
heron.class.state.manager: com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager

# local state manager connection string
heron.statemgr.connection.string:  "masternode:2181"

# path of the root address to store the state in a local file system
heron.statemgr.root.path: "hdfs:///heron"

# create the zookeeper nodes, if they do not exist
heron.statemgr.zookeeper.is.initialize.tree: True

# timeout in ms to wait before considering zookeeper session is dead
heron.statemgr.zookeeper.session.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.connection.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.retry.count: 10

# duration of time to wait until the next retry
heron.statemgr.zookeeper.retry.interval.ms: 10000

uploader.yaml

# uploader class for transferring the topology jar/tar files to storage
heron.class.uploader: com.twitter.heron.uploader.hdfs.HdfsUploader

# Directory of config files for hadoop client to read from
heron.uploader.hdfs.config.directory: /usr/local/hadoop/etc/hadoop

# name of the directory to upload topologies for HDFS uploader
heron.uploader.hdfs.topologies.directory.uri: hdfs:///heron/topologies/${CLUSTER}

client.yaml

# location of the core package
heron.package.core.uri:                      "hdfs:///tmp/.heron/dist/heron-core.tar.gz"

# Whether role/env is required to submit a topology. Default value is False.
heron.config.is.role.required:               False
heron.config.is.env.required:               False

Is there anything wrong in the config files?

maosongfu commented 8 years ago

@kartik894 It is caused by invalid config value in statemgr.yaml when trying to connect zookeeper: heron.statemgr.root.path: "hdfs:///heron" You can try with: /heron Or check zookeeper for path format.

kartik894 commented 8 years ago

@maosongfu Thanks! Its running now.

mhajibaba commented 8 years ago

@maosongfu I have the same problem with error message in #883, but i get the following messages:

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:    
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:  DEBUG] Command=(['job', 'create', '--wait-until', 'RUNNING', '--bind', 'TOPOLOGY_NAME=ExclamationTopology', '--bind', 'SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml', '--bind', 'COMPONENT_RAMMAP=exclaim1:536870912,word:536870912', '--bind', 'SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml', '--bind', 'INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;"', '--bind', 'ROLE=root', '--bind', 'ENVIRON=devel', '--bind', 'SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*', '--bind', 'SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*', '--bind', 'ISPRODUCTION=false', '--bind', 'TOPOLOGY_CLASSPATH=heron-examples.jar', '--bind', 'CLUSTER=aurora', '--bind', 'SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor', '--bind', 'STATEMGR_CONNECTION_STRING=192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181', '--bind', 'COMPONENT_JVM_OPTS_IN_BASE64=""', '--bind', 'TOPOLOGY_ID=ExclamationTopologyc2f53ad0-76be-4e83-8c63-2134faede687', '--bind', 'TOPOLOGY_PACKAGE_URI=file:///root/.herondata/repository/topologies/aurora/root/ExclamationTopology/ExclamationTopology-root-tag-0--3706733491519378097', '--bind', 'SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr', '--bind', 'CORE_PACKAGE_URI=file:///root/.heron/dist/heron-core.tar.gz', '--bind', 'SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*', '--bind', 'TOPOLOGY_PACKAGE_TYPE=jar', '--bind', 'RAM_PER_CONTAINER=2147483648', '--bind', 'SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster', '--bind', 'TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn', '--bind', 'INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0', '--bind', 'NUM_CONTAINERS=2', '--bind', 'CPUS_PER_CONTAINER=1.0', '--bind', 'TOPOLOGY_JAR_FILE=heron-examples.jar', '--bind', 'SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell', '--bind', 'DISK_PER_CONTAINER=1073741824', '--bind', 'STATEMGR_ROOT_PATH=/heron', '--bind', 'HERON_SANDBOX_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64', 'aurora/root/devel/ExclamationTopology', '/root/.heron/conf/aurora/heron.aurora', '--verbose'])
DEBUG] Config: ['"""\n', 'Launch the topology as a single aurora job with multiple instances.\n', 'The heron-executor is responsible for starting a tmaster (container 0)\n', 'and regular stmgr/metricsmgr/instances (container index > 0).\n', '"""\n', '\n', "heron_core_release_uri = '{{CORE_PACKAGE_URI}}'\n", "heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'\n", 'core_release_file = "heron-core.tar.gz"\n', 'topology_package_file = "topology.tar.gz"\n', '\n', '# --- processes ---\n', 'fetch_heron_system = Process(\n', "  name = 'fetch_heron_system',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)\n", ')\n', '\n', 'fetch_user_package = Process(\n', "  name = 'fetch_user_package',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)\n", ')\n', '\n', 'command_to_start_executor = \'{{SANDBOX_EXECUTOR_BINARY}} {{mesos.instance}} {{TOPOLOGY_NAME}} {{TOPOLOGY_ID}} {{TOPOLOGY_DEFINITION_FILE}} {{INSTANCE_DISTRIBUTION}} {{STATEMGR_CONNECTION_STRING}} {{STATEMGR_ROOT_PATH}} {{SANDBOX_TMASTER_BINARY}} {{SANDBOX_STMGR_BINARY}} "{{SANDBOX_METRICSMGR_CLASSPATH}}" {{INSTANCE_JVM_OPTS_IN_BASE64}} "{{TOPOLOGY_CLASSPATH}}" {{thermos.ports[port1]}} {{thermos.ports[port2]}} {{thermos.ports[port3]}} {{SANDBOX_SYSTEM_YAML}} {{COMPONENT_RAMMAP}} {{COMPONENT_JVM_OPTS_IN_BASE64}} {{TOPOLOGY_PACKAGE_TYPE}} {{TOPOLOGY_JAR_FILE}} {{HERON_SANDBOX_JAVA_HOME}} {{thermos.ports[http]}} {{SANDBOX_SHELL_BINARY}} {{thermos.ports[port4]}} {{CLUSTER}} {{ROLE}} {{ENVIRON}} "{{SANDBOX_INSTANCE_CLASSPATH}}" {{SANDBOX_METRICS_YAML}} "{{SANDBOX_SCHEDULER_CLASSPATH}}" "{{thermos.ports[scheduler]}}"\'\n', '\n', 'launch_heron_executor = Process(\n', "  name = 'launch_heron_executor',\n", '  cmdline = command_to_start_executor,\n', '  max_failures = 1\n', ')\n', '\n', 'discover_profiler_port = Process(\n', "  name = 'discover_profiler_port',\n", "  cmdline = 'echo {{thermos.ports[yourkit]}} > yourkit.port'\n", ')\n', '\n', '# --- tasks ---\n', 'heron_task = SequentialTask(\n', "  name = 'setup_and_run',\n", '  processes = [fetch_heron_system, fetch_user_package, launch_heron_executor, discover_profiler_port],\n', "  resources = Resources(cpu = '{{CPUS_PER_CONTAINER}}', ram = '{{RAM_PER_CONTAINER}}', disk = '{{DISK_PER_CONTAINER}}')\n", ')\n', '\n', '# -- jobs ---\n', 'jobs = [\n', '  Job(\n', "    name = '{{TOPOLOGY_NAME}}',\n", "    cluster = '{{CLUSTER}}',\n", "    role = '{{ROLE}}',\n", "    environment = '{{ENVIRON}}',\n", '    service = True,\n', '    task = heron_task,\n', "    instances = '{{NUM_CONTAINERS}}',\n", "    announce = Announcer(primary_port = 'http')\n", '  )\n', ']\n']
Unknown cluster: aurora

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.utils.SchedulerUtils SEVERE:  Failed to invoke IScheduler as library  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 19,2  replyHeader:: 19,4294967667,0  request:: '/heron/executionstate/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 9 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/executionstate/ExclamationTopology  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 20,2  replyHeader:: 20,4294967668,0  request:: '/heron/topologies/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 7 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/topologies/ExclamationTopology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.LaunchRunner SEVERE:  Failed to launch topology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.SubmitterMain SEVERE:  Failed to launch topology. Attempting to roll back upload.  
[2016-07-02 16:27:05 +0430] com.twitter.heron.uploader.localfs.LocalFileSystemUploader INFO:  Clean uploaded jar  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the CuratorClient to: 192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181  
[2016-07-02 16:27:05 +0430] org.apache.curator.framework.imps.CuratorFrameworkImpl FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.CuratorZookeeperClient FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.ConnectionState FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper FINE:  Closing session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Closing client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 21,-11  replyHeader:: 21,4294967669,0  request:: null response:: null  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Disconnecting client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn INFO:  EventThread shut down  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper INFO:  Session: 0x255aabf8eaa0028 closed  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the tunnel processes  
Exception in thread "main" java.lang.RuntimeException: Failed to submit topology ExclamationTopology
    at com.twitter.heron.scheduler.SubmitterMain.main(SubmitterMain.java:319)
ERROR: Failed to launch topology 'ExclamationTopology' because User main failed with status 1. Bailing out...
INFO: Elapsed time: 3.951s.

I changed the env role and ..., but issue didn't solved.

kartik894 commented 8 years ago

Check /etc/aurora/clusters.json file . Change name of cluster to 'aurora'

mhajibaba commented 8 years ago

@kartik894 Thanks a lot! It resolved.

harbby commented 8 years ago

thanks! I deployed in ubuntu 16.04 and centos7 successful, but centos6 submit a job error occurred.

chatterjeesubarna commented 7 years ago

Hello, I am new to Heron. I am submitting a topology as root.

Initially, I did "heron submit aurora/ubuntu/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

Then I got the error "Failed to initialize sandbox: Could not create sandbox because user does not exist: ubuntu"

So I modified and did this:

heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

I am getting this error:

E0329 14:34:14.479283 32970 runner.py:299] Regular plan unhealthy!

Can someone help? Thanks a lot!

huijunwu commented 7 years ago

@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose @maosongfu to confirm

chatterjeesubarna commented 7 years ago

Hello,

Thank you a lot. Yes, it was solved and I could run the job. Probably it was some topology running already and so mesos couldn't schedule another one!

Now, I can see the following on my terminal: "INFO: Topology 'ExclamationTopology' launched successfully"

My heron ui shows: "{"status": "success", "executiontime": 5.316734313964844e-05, "message": "", "version": "0.14.5", "result": {"aurora": {"root": {"devel": ["ExclamationTopology"]}}}}"

Just that, I cannot see the topology on heron tracker. My heron-tracker.yaml looks like this:

statemgrs:   -     type: "zookeeper"     name: "localzk"     hostport: "heron01:2181"     rootpath: "/heron"     tunnelhost: "localhost"

Can you kindly help? Thanks a lot again! Thanking you, Subarna Chatterjee Post-Doctoral ResearcherInria, Rennes Website: http://chatterjeesubarna.wix.com/subarna

On Wednesday, 29 March 2017, 19:33, bed debug <notifications@github.com> wrote:

@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose @maosongfu to confirm— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

billonahill commented 7 years ago

@chatterjeesubarna I've answered your question on the mailing list. Please don't double post questions on both the mailing list and git issues. Also, troubleshooting questions are best handled on the mailing list. Git issues should just be for bugs or feature requests.

bjmota commented 7 years ago

Hello! I have a problem developing a Heron Cluster, when I submit the ExclamationTopoly......

b1@master_1:~$ heron submit aurora/b1/devel  --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2017-05-25 09:18:08 +0000] [INFO]: Using config file under /home/b1/.heron/conf/aurora
[2017-05-25 09:18:08 +0000] [INFO]: Launching topology: 'ExclamationTopology'
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: 192.168.57.163:2181  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -test -e /heron/topologies/aurora''  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'. Overwriting it now  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpvMxs3m/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -copyFromLocal -f /tmp/tmpvMxs3m/topology.tar.gz /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``aurora job create --wait-until RUNNING --bind STMGR_BINARY=./heron-core/bin/heron-stmgr --bind RAM_PER_CONTAINER=11811160064 --bind TOPOLOGY_PACKAGE_TYPE=jar --bind SHELL_BINARY=./heron-core/bin/heron-shell --bind TMASTER_BINARY=./heron-core/bin/heron-tmaster --bind STATEMGR_ROOT_PATH=/heron --bind TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz --bind JAVA_HOME=/usr/lib/jvm/java-8-oracle --bind CLUSTER=aurora --bind TOPOLOGY_BINARY_FILE=heron-examples.jar --bind SYSTEM_YAML=./heron-conf/heron_internals.yaml --bind EXECUTOR_BINARY=./heron-core/bin/heron-executor --bind CPUS_PER_CONTAINER=5.0 --bind IS_PRODUCTION=false --bind PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance --bind METRICS_YAML=./heron-conf/metrics_sinks.yaml --bind CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz --bind TOPOLOGY_CLASSPATH=heron-examples.jar --bind TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6 --bind ROLE=b1 --bind COMPONENT_JVM_OPTS_IN_BASE64="" --bind TOPOLOGY_NAME=ExclamationTopology --bind STATEMGR_CONNECTION_STRING=192.168.57.163:2181 --bind INSTANCE_CLASSPATH=./heron-core/lib/instance/* --bind DISK_PER_CONTAINER=5368709120 --bind COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472 --bind METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/* --bind ENVIRON=devel --bind SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/* --bind TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn --bind NUM_CONTAINERS=3 --bind INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" aurora/b1/devel/ExclamationTopology /home/b1/.heron/conf/aurora/heron.aurora''  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Error loading configuration: Unknown cluster: aurora
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.aurora.AuroraCLIController: Failed to run process. Command=[aurora, job, create, --wait-until, RUNNING, --bind, STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, RAM_PER_CONTAINER=11811160064, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz, --bind, JAVA_HOME=/usr/lib/jvm/java-8-oracle, --bind, CLUSTER=aurora, --bind, TOPOLOGY_BINARY_FILE=heron-examples.jar, --bind, SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CPUS_PER_CONTAINER=5.0, --bind, IS_PRODUCTION=false, --bind, PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance, --bind, METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6, --bind, ROLE=b1, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, STATEMGR_CONNECTION_STRING=192.168.57.163:2181, --bind, INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, DISK_PER_CONTAINER=5368709120, --bind, COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472, --bind, METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ENVIRON=devel, --bind, SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, NUM_CONTAINERS=3, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", aurora/b1/devel/ExclamationTopology, /home/b1/.heron/conf/aurora/heron.aurora], STDOUT=, STDERR=Error loading configuration: Unknown cluster: aurora  
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.utils.LauncherUtils: Failed to invoke IScheduler as library  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -rm /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Deleted /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: 192.168.57.163:2181  
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [INFO]: Elapsed time: 15.872s.

Also in Aurora I only see the Example in thehttp://192.168.57.163:8081/scheduler....

Help me, Please!

billonahill commented 7 years ago

@bjmota would you please ask troubleshooting questions on the mailing list? Github issues should be used for filing bugs and feature requests.