Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

join pooled clusters based on yarn cluster metrics #2191

Closed coyotemarin closed 4 years ago

coyotemarin commented 4 years ago

Currently, pooling will check if a cluster is "big enough" in terms of memory, CPU and a few other aspects (e.g. EBS volume size).

We can instead compare memory and CPU needs by SSHing to the cluster's YARN resource manager and querying its metrics API for availableMB and availableVirtualCores.

Not only would this provide more useful information about a cluster that can run multiple jobs simultaneously, it would also allow us to skip querying the cluster's instances ListInstanceGroups/ListInstanceFleets, saving an API call.

coyotemarin commented 4 years ago

Probably should call these options min_available_mb and min_available_virtual_cores. If either is set, we can bypass checking the cluster's instance information.

coyotemarin commented 4 years ago

core_instance_type, num_core_instances, etc. will still be relevant when there is no pooled cluster available and we need to start our own.