Closed zengzhaozheng closed 9 years ago
能贴一段owl/collector.log的内容么?metrics的收集log在这个文件中; 点击metrics页面展示的log在owl/server.log以及owl/debug.log中
options: {'collector_cfg': 'collector.cfg', 'settings': None, 'use_threadpool': False, 'pythonpath': None, 'verbosity': u'1', 'traceback': None, 'no_color': False, 'clear_oldtasks': False} INFO 2014-11-25 14:06:51,244 collect 63152 139699130095360 <Task: hdfs/dptst-example/journalnode/0> waiting 6.827905 seconds for http://localhost:12101/jmx?qry=Hadoop:... INFO 2014-11-25 14:06:51,245 collect 63152 139699130095360 <Task: hdfs/dptst-example/journalnode/1> waiting 6.396616 seconds for http://localhost:12101/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,245 collect 63152 139699130095360 <Task: hdfs/dptst-example/journalnode/2> waiting 0.373451 seconds for http://localhost:12101/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,245 collect 63152 139699130095360 <Task: hdfs/dptst-example/namenode/0> waiting 6.239103 seconds for http://localhost:12201/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,245 collect 63152 139699130095360 <Task: hdfs/dptst-example/namenode/1> waiting 7.122569 seconds for http://localhost:12201/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/0> waiting 1.824659 seconds for http://localhost:12401/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/1> waiting 6.874303 seconds for http://localhost:12401/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/2> waiting 0.431661 seconds for http://localhost:12411/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/3> waiting 1.287644 seconds for http://localhost:12401/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/4> waiting 7.967131 seconds for http://localhost:12411/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hdfs/dptst-example/datanode/5> waiting 7.407624 seconds for http://localhost:12421/jmx?qry=Hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hbase/dptst-example/master/0> waiting 4.281206 seconds for http://localhost:12501/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,246 collect 63152 139699130095360 <Task: hbase/dptst-example/master/1> waiting 3.529428 seconds for http://localhost:12501/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/0> waiting 5.092863 seconds for http://localhost:12601/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/1> waiting 6.544828 seconds for http://localhost:12601/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/2> waiting 2.135464 seconds for http://localhost:12611/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/3> waiting 3.166893 seconds for http://localhost:12601/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/4> waiting 5.212955 seconds for http://localhost:12611/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: hbase/dptst-example/regionserver/5> waiting 0.850794 seconds for http://localhost:12621/jmx?qry=hadoop:_... INFO 2014-11-25 14:06:51,247 collect 63152 139699130095360 <Task: impala/dptst-example/statestored/0> waiting 5.578948 seconds for http://localhost:21301/... 这个有点郁闷,为什么是 http://localhost:12621/jmx?qry=hadoop:_这个jmx的url的,我已经重新配置了
请问修改完/data/hadoop/z.zeng/minos-master/config/owl/collector.cfg之后需要重启哪些进程的?
我这个是只装了owl的,没有装tank和supervisor的。
owl/debug.log出现了一段错误: ERROR 2014-11-25 14:18:45,279 base 3404 139814237136640 Internal Server Error: /failover/ Traceback (most recent call last): File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response response = wrapped_callback(request, _callback_args, _callback_kwargs) File "/data/hadoop/z.zeng/minos-master/owl/failover_framework/views.py", line 29, in index hour_task_number = Task.objects.filter(start_timestamp__gt=previous_hour).count() File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/models/query.py", line 338, in count return self.query.get_count(using=self.db) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/models/sql/query.py", line 424, in get_count number = obj.get_aggregation(using=using)[None] File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/models/sql/query.py", line 390, in get_aggregation result = query.get_compiler(using).execute_sql(SINGLE) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql cursor.execute(sql, params) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute return self.cursor.execute(sql, params) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/utils.py", line 94, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute return self.cursor.execute(sql, params) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 128, in execute return self.cursor.execute(query, args) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute self.errorhandler(self, exc, value) File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue ProgrammingError: (1146, "Table 'hadoop_owl.failover_framework_task' doesn't exist") ERROR 2014-11-25 14:52:54,078 base 11619 139746212833024 Internal Server Error: /monitor/table/count_rows/ Traceback (most recent call last): File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response response = wrapped_callback(request, _callback_args, _callback_kwargs) File "/data/hadoop/z.zeng/minos-master/owl/monitor/views.py", line 583, in show_table_count_rows 'count_period': settings.COUNT_PERIOD, File "/data/hadoop/z.zeng/minos-master/build/env/lib/python2.7/site-packages/django/conf/init.py", line 47, in getattr return getattr(self._wrapped, name) AttributeError: 'Settings' object has no attribute 'COUNT_PERIOD'
说这个表hadoop_owl.failover_framework_task不存在。(这个是我点击页面上failover选项卡的时候报的)
在现有集群上只用owl,不用tank和supervivor真可以?这种情况下需不需要动手配置 hdfs-dptst-example.cfg?
请问下,哪个脚本可以停止collector的?
开源出去的没有,这个做的还不完善; @zengzhaozheng 直接ps -ef | grep collect 来kill吧,然后重新启动吧 :-)
开源版本各个daemon进程的管理还是比较乱的,owl方面开源版本确实好长时间没有维护了,我们内部是使用supervisord来管理owl后面所有进程的,后续等有时间或有人手了,会把owl开源的部分再完善修复下 :-)
datanode 3 sx-slave4:12201 datanode 4 sx-slave5:12201 datanode 5 sx-slave6:12201 这些我在界面上面看到的信息。其中端口12201是指datanode的HTTP服务器和端口吗?
config/conf/hdfs/hdfs-dptst-example.cfg 这里边设置的端口必须为100的倍数,我单独用owl的时候,对应的datanode、namenode的http端口都要改,这个比较麻烦。
我的owl启动安装都已经ok了?但是打开监控页面的时候什么都检测不到HDFS和其他的监控指标?请问我还要配置什么? 我的/data/hadoop/z.zeng/minos-master/config/owl/collector.cfg内容如下:
collector config
[collector] services=hdfs hbase yarn impala
Period to fetch/report metrics, in seconds.
period=10
[hdfs] clusters=dptst-example jobs=journalnode namenode datanode
The jmx output of each bean is as following:
{
"name" : "hadoop:service=RegionServer,name=RegionServerDynamicStatistics",
"modelerType" : "org.apache.hadoop.hbase.regionserver.metrics.RegionServerDynamicStatistics",
"tbl.YCSBTest.cf.test.blockCacheNumCached" : 0,
"tbl.YCSBTest.cf.test.compactionBlockReadCacheHitCnt" : 0,
...
Some metrics/values are from hjadoop/hbase and some are from java run time
environment, we specify a filter on jmx url to get hadoop/hbase metrics.
metric_url=/jmx?qry=Hadoop:*
metric_url=http://sx-master:50070/jmx?qry=Hadoop:*
[hbase] clusters=dptst-example jobs=master regionserver metric_url=/jmx?qry=hadoop:*
[yarn] clusters=dptst-example jobs=resourcemanager nodemanager historyserver proxyserver metric_url=/jmx?qry=Hadoop:*
[impala] clusters=dptst-example jobs=statestored impalad metric_url=/ need_analyze=false