atlarge-research / graphalytics-platforms-graphx

Apache License 2.0
0 stars 4 forks source link

Why are containers not being fully utilized? #19

Open MBtech opened 5 years ago

MBtech commented 5 years ago

I am experimenting with the LDBC benchmark and using Graphx as a reference. I have set the following spark job configurations in platform.properties

platform.graphx.job.executor.instances = 2
# Memory available per Spark executor
platform.graphx.job.executor.memory = 8g
# Cores available per Spark executor
platform.graphx.job.executor.cores = 8

In the YARN UI, I would have expected the to see 2 containers running for a benchmark. Each container using 8 cores and 8GB memory. But instead I see this: Screenshot 2019-08-05 at 10 04 53 PM

There are 3 containers. What's the extra container for and why each container is only utilizing 1 VCore.

What am I missing here?

thegeman commented 5 years ago

Hi @MBtech,

The extra container you see is Spark's Application Master. It is part of any Spark deployment on YARN, but its only task is to start Spark executors (the other two containers).

The VCore count in YARN is somewhat misleading; the number of VCores assigned to a container does not determine the number of cores the container can actually use. Even though the YARN UI says that each container has only 1 VCore assigned to it, the executors will use the number of cores set by the platform.graphx.job.executor.cores property.

Hope this answers your question. Let us know if you have any other questions about running Graphalytics.