jpchauhan commented 6 years ago

When we say "the cluster has a fixed limit on the number of containers available". How do we determine that?

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 40fdbf85-03fc-7ba1-dca8-4c46ad35500f
Version Independent ID: 5d635132-623b-ba6a-27d7-1b801fe74fdf
Content: Hadoop architecture - Azure HDInsight
Content Source: articles/hdinsight/hdinsight-hadoop-architecture.md
Service: hdinsight
GitHub Login: @ashishthaps
Microsoft Alias: ashishth

Alberto-Vega commented 6 years ago

@jpchauhan Thanks for the feedback! We are currently investigating and will get back to you shortly.

mamccrea commented 6 years ago

@jpchauhan - It depends on the available resources. YARN uses a global ResourceManager (RM), per-worker-node NodeManagers (NMs), and per-application ApplicationMasters (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as containers.

JasonWHowell commented 6 years ago

@jpchauhan One other point: when you create an HDInsight cluster, the Virtual Machine size is part of the choices you can make in the Azure portal. The amount of memory and CPUs available in the cluster is a factor of which size of VMs you pick and how many worker nodes you pick. The price varies based on the VM size and number of VMs. More details https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-capacity-planning

For example, in East US I see these choices:

When the various YARN services accept the incoming work, and assign containers to do work on those worker nodes, it monitors the available resources to assign containers to host the work and divide the cluster resources.

There are a number of knobs, depending on the workload, to fine tune performance, which will influence container sizes and number of concurrent jobs that can run. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-changing-configs-via-ambari

That's a lot to digest, but start with a small cluster to test with, and to see if performance is suitable before fine tuning and increasing the cluster size in additional tests.

Thanks! Jason

MicrosoftDocs / azure-docs

How to determine limits of the "Number of Containers"? #16074

Document Details

please-close