join pooled clusters based on yarn cluster metrics

coyotemarin commented 4 years ago

Currently, pooling will check if a cluster is "big enough" in terms of memory, CPU and a few other aspects (e.g. EBS volume size).

We can instead compare memory and CPU needs by SSHing to the cluster's YARN resource manager and querying its metrics API for availableMB and availableVirtualCores.

Not only would this provide more useful information about a cluster that can run multiple jobs simultaneously, it would also allow us to skip querying the cluster's instances ListInstanceGroups/ListInstanceFleets, saving an API call.

coyotemarin commented 4 years ago

Probably should call these options min_available_mb and min_available_virtual_cores. If either is set, we can bypass checking the cluster's instance information.

coyotemarin commented 4 years ago

core_instance_type, num_core_instances, etc. will still be relevant when there is no pooled cluster available and we need to start our own.

Yelp / mrjob

join pooled clusters based on yarn cluster metrics #2191