Closed jpchauhan closed 6 years ago
@jpchauhan Thanks for the feedback! We are currently investigating and will get back to you shortly.
@jpchauhan - It depends on the available resources. YARN uses a global ResourceManager (RM), per-worker-node NodeManagers (NMs), and per-application ApplicationMasters (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as containers.
@jpchauhan One other point: when you create an HDInsight cluster, the Virtual Machine size is part of the choices you can make in the Azure portal. The amount of memory and CPUs available in the cluster is a factor of which size of VMs you pick and how many worker nodes you pick. The price varies based on the VM size and number of VMs. More details https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-capacity-planning
For example, in East US I see these choices:
When the various YARN services accept the incoming work, and assign containers to do work on those worker nodes, it monitors the available resources to assign containers to host the work and divide the cluster resources.
There are a number of knobs, depending on the workload, to fine tune performance, which will influence container sizes and number of concurrent jobs that can run. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-changing-configs-via-ambari
That's a lot to digest, but start with a small cluster to test with, and to see if performance is suitable before fine tuning and increasing the cluster size in additional tests.
Thanks! Jason
When we say "the cluster has a fixed limit on the number of containers available". How do we determine that?
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.