Closed ellenzhu closed 6 years ago
log 本身正常,是卡在这里不动了?
@ellenzhu 最好贴一下代码,方便复现问题
感觉就是卡住了。这是代码: import json import dpark import os
path = "attack_logs/" files = os.listdir(path) list = []
def select_matches(dict): if 'matches' in dict.keys(): return dict['matches'] else: return dict['time']
def analysis_id(): for file in files: with open("attack_logs/" + file) as f: line = f.readlines() for i in line: list.append(json.loads(i)) d1 = dpark.makeRDD(list).map(select_matches).map(lambda x: (x,1)).reduceByKey(lambda x,y : x+y) print d1.collect()
if name == "main": analysis_id()
@ellenzhu 用你给的代码,没有复现问题
@ariesdevil 难道是机器的问题。。。?另一台机器也没问题
@ellenzhu 有可能是机器上的 python 相关的问题
我用python3.6.1安装的,然后运行环境是2.7.5
@ellenzhu 这个跟项目无关了,麻烦自己找一下原因吧。
同一个程序,可以在一台机器上正常运行,但在另一台机器日志显示如下: [@fffffff /data]# python analyze.py -v 2018-07-31 10:38:34,835 [INFO] [dpark.context] start listening on Web UI with port: 50555 2018-07-31 10:38:34,860 [DEBUG] [dpark.env] start env in 20685: True {'is_local': True} 2018-07-31 10:38:34,861 [DEBUG] [dpark.tracker] TrackerServer started at tcp://forrest16-71-142:36528 2018-07-31 10:38:34,993 [DEBUG] [dpark.shuffle] shuffle dir: ['/dev/shm/forrest16-71-142-a0a13019-6ad3-437a-a939-cf2463135a91', '/tmp/dpark/forrest16-71-142-a0a13019-6ad3-437a-a939-cf2463135a91'] 2018-07-31 10:38:34,993 [DEBUG] [dpark.shuffle] MapOutputTracker started 2018-07-31 10:38:34,994 [DEBUG] [dpark.broadcast] guide start at tcp://forrest16-71-142:42991 2018-07-31 10:38:34,995 [DEBUG] [dpark.broadcast] broadcast started: tcp://forrest16-71-142:42991 2018-07-31 10:38:34,995 [DEBUG] [dpark.env] env started 2018-07-31 10:38:34,995 [DEBUG] [dpark.schedule] new stage: <Stage(1) for <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:34,995 [DEBUG] [dpark.schedule] new stage: <Stage(2) for <ShuffledRDD <MappedRDD <MappedRDD <ParallelCollection 291>>>>> 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] Final stage: <Stage(2) for <ShuffledRDD <MappedRDD <MappedRDD <ParallelCollection 291>>>>>, 2 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] Parents of final stage: [<dpark.schedule.Stage instance at 0x189c4d0>] 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] Missing parents: [<dpark.schedule.Stage instance at 0x189c4d0>] 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] submit stage <Stage(2) for <ShuffledRDD <MappedRDD <MappedRDD <ParallelCollection 291>>>>> 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] submit stage <Stage(1) for <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] add to pending 2 tasks 2018-07-31 10:38:34,996 [DEBUG] [dpark.schedule] submit tasks [<ShuffleTask(1, 0) of <MappedRDD <MappedRDD <ParallelCollection 291>>>>, <ShuffleTask(1, 1) of <MappedRDD <MappedRDD <ParallelCollection 291>>>>] in LocalScheduler 2018-07-31 10:38:34,998 [DEBUG] [dpark.schedule] Running task <ShuffleTask(1, 0) of <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:34,999 [DEBUG] [dpark.task] shuffling 0 of <MappedRDD <MappedRDD <ParallelCollection 291>>> 2018-07-31 10:38:35,008 [DEBUG] [dpark.schedule] Running task <ShuffleTask(1, 1) of <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:35,008 [DEBUG] [dpark.task] shuffling 1 of <MappedRDD <MappedRDD <ParallelCollection 291>>> 2018-07-31 10:38:35,013 [DEBUG] [dpark.schedule] remove from pending <ShuffleTask(1, 0) of <MappedRDD <MappedRDD <ParallelCollection 291>>>> from <Stage(1) for <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:35,013 [DEBUG] [dpark.schedule] remove from pending <ShuffleTask(1, 1) of <MappedRDD <MappedRDD <ParallelCollection 291>>>> from <Stage(1) for <MappedRDD <MappedRDD <ParallelCollection 291>>>> 2018-07-31 10:38:35,013 [DEBUG] [dpark.schedule] <Stage(1) for <MappedRDD <MappedRDD <ParallelCollection 291>>>> finished; looking for newly runnable stages