dccspeed / fractal

Apache License 2.0
28 stars 8 forks source link

Single Machine Multicore #3

Closed 13k75 closed 5 years ago

13k75 commented 5 years ago

Hi Vinicius,

We corresponded briefly via email, but Github seems more convenient.

I am having difficulty running fractal in a multicore single machine environment. Specifying worker_cores > 1 leads to a deadlock.

From examining the logs, it seems that Fractal is not starting multiple actors. One slave is created, but then I'm guessing the Master is waiting to find another? This is the tail end of the logs when the deadlock happens. My CPU usage drops to 0 and nothing happens afterwards. Any help would be appreciated!

2019-09-12 15:56:55,337 INFO ActorMessageSystem$: Started akka-sys: akka://fractal-msgsys - executor - waiting for messages
2019-09-12 15:56:55,338 INFO SlaveActor: Actor Actor[akka://fractal-msgsys/user/slave-actor-11-0-0-0#740124414] started
2019-09-12 15:56:55,340 INFO SlaveActor: Actor[akka://fractal-msgsys/user/slave-actor-11-0-0-0#740124414] sending identification to master
2019-09-12 15:56:55,454 INFO SlaveActor: Actor[akka://fractal-msgsys/user/slave-actor-11-0-0-0#740124414] knows master: Actor[akka.tcp://fractal-msgsys@127.0.1.1:2552/user/master-actor-11-0#1302576852]
[WARN] [SECURITY][09/12/2019 15:56:55.455] [fractal-msgsys-akka.actor.default-dispatcher-3] [akka.serialization.Serialization(akka://fractal-msgsys)] Using the default Java serializer for class [br.ufmg.cs.systems.fractal.computation.HelloMaster] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage'
[WARN] [SECURITY][09/12/2019 15:56:55.460] [fractal-msgsys-akka.actor.default-dispatcher-3] [akka.serialization.Serialization(akka://fractal-msgsys)] Using the default Java serializer for class [br.ufmg.cs.systems.fractal.computation.Log] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage'
2019-09-12 15:56:55,464 INFO MasterActor: Actor[akka://fractal-msgsys/user/master-actor-11-0#1302576852] knows 1 slaves.
2019-09-12 15:56:55,464 INFO MasterActor: StatsReport{step=0,partitionId=0,canonical_subgraphs_1:0,neighborhood_lookups_0:0,valid_subgraphs_1:0,subgraphs_output:0,canonical_subgraphs_4:0,valid_subgraphs_0:0,valid_subgraphs_3:0,canonical_subgraphs_3:0,neighborhood_lookups_2:0,neighborhood_lookups_5:0,canonical_subgraphs_0:0,valid_subgraphs_5:0,valid_subgraphs_2:0,neighborhood_lookups_1:0,canonical_subgraphs_2:0,neighborhood_lookups_4:0,canonical_subgraphs_5:0,neighborhood_lookups_3:0,valid_subgraphs_4:0,maxMemory=1.77783203125,totalMemory=0.43798828125,freeMemory=0.3339014947414398,usedMemory=0.10408678650856018}
viniciusvdias commented 5 years ago

Hi! Can you provide the command/parameters/workload used in this execution, so I can try and reproduce the issue?

13k75 commented 5 years ago

Sure thing, I ran worker_cores=2 steps=2 inputgraph=citeseer-single-label.graph alg=motifs ./bin/fractal.sh

viniciusvdias commented 5 years ago

Got it, indeed there is a bug with the submission script fractal.sh. The thing is that worker_cores must be also be used to set the spark master configuration, because otherwise fractal thinks there is 2 cores available, but the submission script is asking for 1 core only, which leads to this hang. Thanks!

Also citeseer dataset is missing from data/, my bad.

Please, check out this pull request #6. Let me know if this is fixed so I can close this issue.

13k75 commented 5 years ago

That seems to fix it! Thanks a lot :)