dccspeed / fractal

Apache License 2.0
28 stars 8 forks source link

CPU very low utilization when running with steps >= 3 #25

Closed zzylol closed 2 years ago

zzylol commented 2 years ago

Dear authors,

Thanks for the great work!

I found when I try to mining 4-cliques or other patterns with steps to explore more than 2, the CPU on each machine can not be fully occupied, and actually almost 100% idle.

My set up with Spark is stand alone version. I am not sure what's wrong. Should I wait the low CPU utilization experiment to be finished? Or perhaps my spark 2.2.0 setting up is wrong?

We are using 4 physical machines, each with 20 cores, 40 threads.

Could you give me some suggestions? Thanks a lot!

viniciusvdias commented 2 years ago

Hi, thank you for your interest. Most likely this is some configuration issue. Can you share the submission command and logs?

zzylol commented 2 years ago

Hi, thanks for your reply in the email. I run the command as below:

steps=2 num_workers=4 worker_cores=40 inputgraph=$FRACTAL_HOME/data/citeseer-single-label.graph app=cliques ./bin/fractal.sh

and part of the logs:

21/11/22 08:33:48 INFO graph.BasicMainGraph: Initializing graph, id=51 name=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph
21/11/22 08:33:48 INFO graph.BasicMainGraph: Done initializing graph, id=51 name=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph elapsed=0 ms
21/11/22 08:33:48 INFO SparkConfiguration: Graph created, configId=11 graph=Graph(id=0, name=/users/zz_y/fractal/data/citeseer-single-label.graph, isEdgeLabelled=false, isMultiGraph=false, class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph)
21/11/22 08:33:48 INFO SparkConfiguration: MainGraph is empty, gonna try reading it
21/11/22 08:33:48 INFO graph.BasicMainGraph: Reading graph properties
21/11/22 08:33:48 WARN conf.Configuration: Graph properties file not found: /users/zz_y/fractal/data/citeseer-single-label.graph.prop
21/11/22 08:33:48 INFO graph.BasicMainGraph: Reading graph, id=0 name=/users/zz_y/fractal/data/citeseer-single-label.graph path=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph
21/11/22 08:33:48 INFO executor.Executor: Executor interrupted and killed task 13.0 in stage 0.0 (TID 13), reason: stage cancelled
21/11/22 08:33:48 INFO SparkConfiguration: Initializing config, id=11 config=SparkConfiguration(Map(computation_container ->  CC[20][first_computation](bypass=false,_,_,_,_,pc,_):: CC[18][18](bypass=true,_,_,_,f,pc,_):: CC[17][17](bypass=false,_,_,_,_,pc,_):: CC[16][16](bypass=true,_,_,_,f,pc,_):: CC[15][15](bypass=false,_,_,_,_,pc,_):: CC[14][14](bypass=true,_,_,_,f,pc,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.optimizations -> br.ufmg.cs.systems.fractal.optimization.CliqueOptimization, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, num_partitions -> 160, fractal.log.level -> info, hadoop_conf -> br.ufmg.cs.systems.fractal.util.SerializableConfiguration@47a7c93e, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch)) mainGraph=Graph(id=0, name=/users/zz_y/fractal/data/citeseer-single-label.graph, isEdgeLabelled=false, isMultiGraph=false, class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph) isMainGraphRead=false isMaster=false activeConfigs={0=SparkConfiguration(Map(fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 1=SparkConfiguration(Map(computation_container ->  CC[0][0](bypass=false,_,_,_,_,_,_), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 2=SparkConfiguration(Map(computation_container ->  CC[3][3](bypass=false,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 3=SparkConfiguration(Map(computation_container ->  CC[3][3](bypass=false,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 4=SparkConfiguration(Map(computation_container ->  CC[4][4](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 5=SparkConfiguration(Map(computation_container ->  CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 6=SparkConfiguration(Map(computation_container ->  CC[6][6](bypass=false,_,_,_,_,_,_):: CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0)), 7=SparkConfiguration(Map(computation_container ->  CC[6][6](bypass=false,_,_,_,_,_,_):: CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch)), 8=SparkConfiguration(Map(computation_container ->  CC[6][6](bypass=false,_,_,_,_,_,_):: CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, num_partitions -> 160, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch)), 9=SparkConfiguration(Map(computation_container ->  CC[6][6](bypass=false,_,_,_,_,_,_):: CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.optimizations -> br.ufmg.cs.systems.fractal.optimization.CliqueOptimization, fractal.graph.local -> false, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, num_partitions -> 160, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch)), 10=SparkConfiguration(Map(computation_container ->  CC[8][8](bypass=false,_,_,_,_,_,_):: CC[7][7](bypass=true,_,_,_,f,_,p):: CC[6][6](bypass=false,_,_,_,_,_,_):: CC[5][5](bypass=true,_,_,_,f,_,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, fractal.graph.local -> false, fractal.optimizations -> br.ufmg.cs.systems.fractal.optimization.CliqueOptimization, num_partitions -> 160, fractal.log.level -> info, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch)), 11=SparkConfiguration(Map(computation_container ->  CC[20][first_computation](bypass=false,_,_,_,_,pc,_):: CC[18][18](bypass=true,_,_,_,f,pc,_):: CC[17][17](bypass=false,_,_,_,_,pc,_):: CC[16][16](bypass=true,_,_,_,f,pc,_):: CC[15][15](bypass=false,_,_,_,_,pc,_):: CC[14][14](bypass=true,_,_,_,f,pc,p), fractal.master.hostname -> 128.105.144.251, fractal.graph.local -> false, fractal.optimizations -> br.ufmg.cs.systems.fractal.optimization.CliqueOptimization, fractal.graph.class -> br.ufmg.cs.systems.fractal.graph.BasicMainGraph, num_partitions -> 160, fractal.log.level -> info, hadoop_conf -> br.ufmg.cs.systems.fractal.util.SerializableConfiguration@47a7c93e, fractal.aggregation.incremental -> true, fractal.graph.location -> /users/zz_y/fractal/data/citeseer-single-label.graph, fractal.output.path -> /tmp/fractal-e13bf202-7078-4c04-8cba-38855f0c4f75/graph-15b337cc-21e8-4551-bab1-d076fac45fad/vertex-computation-0, fractal.comm.strategy -> scratch))}
21/11/22 08:33:48 WARN scheduler.TaskSetManager: Lost task 13.0 in stage 0.0 (TID 13, localhost, executor driver): TaskKilled (stage cancelled)
21/11/22 08:33:48 INFO graph.BasicMainGraph: Initializing graph, id=52 name=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph
21/11/22 08:33:48 INFO graph.BasicMainGraph: Done initializing graph, id=52 name=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph elapsed=0 ms
21/11/22 08:33:48 INFO SparkConfiguration: Graph created, configId=11 graph=Graph(id=0, name=/users/zz_y/fractal/data/citeseer-single-label.graph, isEdgeLabelled=false, isMultiGraph=false, class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph)
21/11/22 08:33:48 INFO SparkConfiguration: MainGraph is empty, gonna try reading it
21/11/22 08:33:48 INFO graph.BasicMainGraph: Reading graph properties
21/11/22 08:33:48 WARN conf.Configuration: Graph properties file not found: /users/zz_y/fractal/data/citeseer-single-label.graph.prop
21/11/22 08:33:48 INFO graph.BasicMainGraph: Reading graph, id=0 name=/users/zz_y/fractal/data/citeseer-single-label.graph path=/users/zz_y/fractal/data/citeseer-single-label.graph isEdgeLabelled=false isMultiGraph=false class=class br.ufmg.cs.systems.fractal.graph.BasicMainGraph
21/11/22 08:33:48 INFO executor.Executor: Executor interrupted and killed task 40.0 in stage 0.0 (TID 40), reason: stage cancelled
21/11/22 08:33:48 WARN scheduler.TaskSetManager: Lost task 40.0 in stage 0.0 (TID 40, localhost, executor driver): TaskKilled (stage cancelled)
21/11/22 08:33:48 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
^C21/11/22 08:33:58 INFO spark.SparkContext: Invoking stop() from shutdown hook
21/11/22 08:33:58 INFO server.AbstractConnector: Stopped Spark@f713686{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
21/11/22 08:33:58 INFO ui.SparkUI: Stopped Spark web UI at http://128.105.144.251:4040
21/11/22 08:33:58 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/11/22 08:33:58 INFO memory.MemoryStore: MemoryStore cleared
21/11/22 08:33:58 INFO storage.BlockManager: BlockManager stopped
21/11/22 08:33:58 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/11/22 08:33:58 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/11/22 08:33:58 INFO spark.SparkContext: Successfully stopped SparkContext
21/11/22 08:33:58 INFO util.ShutdownHookManager: Shutdown hook called
21/11/22 08:33:58 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ceb89dbd-be72-47f2-95e6-c1926a0c8241

I make sure that the citeseer graph is in the right location in each machine's local file system. It seems the code is reading graph from hadoop. But when I put citeseer-single-label.graph to hdfs, the log shows:

21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 33 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=24,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 34 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=0,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 35 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=17,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 36 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=31,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 37 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=33,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 38 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=38,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6478383392095566,usedMemory=0.16466166079044342}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 39 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=19,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6459424197673798,usedMemory=0.16655758023262024}
21/11/22 08:46:00 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#988143833] knows 40 slaves.
21/11/22 08:46:00 INFO MasterActor: StatsReport{step=0,partitionId=35,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.8125,totalMemory=1.8125,freeMemory=1.6459424197673798,usedMemory=0.16655758023262024}

I am not sure where goes wrong.

BTW, what does server deploy_mode in the README mean?

I installed hadoop-2.6.0. It seems spark cannot run without hadoop started. I am a little bit confused.

viniciusvdias commented 2 years ago

If citeseer is in HDFS, you should replace the command with inputgraph=hdfs://.

The master LOG is not helpful for this purpose. It is best if you investigate worker logs: https://spark.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging

deploy_mode is meant to be used by spark submit script when submitting an application through YARN.

zzylol commented 2 years ago

Thanks for your reply! I figured out that I should use deploy_mode=cluster for running 4 machines. I have another question that if each of my machine has 186GB memory, how much memory should I use for driver and how much memory for worker will give best performance for fractal? Thanks!

viniciusvdias commented 2 years ago

This really depends on how large your aggregations are, the size of the input graph, how many workers, etc. Most memory should go to the workers because they must manage the whole input graph and partial aggregations. I would check the logs and JVM garbage collection, in case no long pauses or no crash is observed I think you are fine. Tuning distributed/parallel executions is very tricky though.