If getting less than 1/2 the max bandwidth with my cluster, should I be choosing a different instance type?

coiled / feedback

A place to provide Coiled feedback

14 stars 3 forks source link

If getting less than 1/2 the max bandwidth with my cluster, should I be choosing a different instance type? #286

Open rsignell opened 1 month ago

rsignell commented 1 month ago

I've ran a workflow that was just extracting a bunch of data values from a bunch of files in object storage (extracting a time series from a large collection of global simulation NetCDF files on AWS S3).

I have a cluster of 50 workers (200 threads) and I'm only getting less than 1/2 the max bandwidth of the cluster.

Does this mean I should choose a different instance type and perhaps lower my costs?

fjetter commented 1 month ago

changing instance types has only very little impact on the network. Memory should likely be the primary decision factor for the instance type, followed by CPUs.

unrelated to the instance types, you may want to increase the worker threads since this is a primarily network bound problem. The network throughput is likely limited by S3 which throttles at about 50MiB/s per connection. On your cluster, you have 50 workers, 4 threads each, i.e. 50MiB/s * 50 * 4 ~ 10GiB/s

You might get better performance if you doubled the number of threads...

import coiled
cluster = coiled.Cluster(
    worker_vm_types=["m7i.xlarge"],  # pick whatever you like, of course (or use default but check #CPUs)
    worker_options={
        # make sure this is aligned to the instance type. This is 2x the number CPUs
        "nthreads": 8
    }, 
)
client = cluster.get_client()

just be careful that now every worker also has twice as many partitions, i.e. it could blow up in memory!

ntabris commented 1 month ago

Hi, @rsignell.

Florian and I just had a quick chat and it also probably makes sense to try using a larger number of smaller workers—e.g., 100 m7g.large workers (instead of 50 m7g.xlarge).

Depending on how much tuning you want to do, trying both smaller workers and some oversubscription of threads (maybe 1.5x or maybe 2x, I wouldn't go higher than that).

rsignell commented 1 month ago

Bingo @ntabris!

I was a little confused by the initial response because I was already using all the 4 threads on the 50 m7g.xlarge instances Coiled picked for me. I did try using all 8 threads on 25 m6g.2xlarge instances, but that took much longer -- over twice as long.

I then noticed while perusing the different characteristics of the AWS instance ARM instance types that they have a free trial going on until Dec 31, 2024 on the t4g.small instances:

And when I fired off 100 of these t4g.small 2cpu machines, I got the same performance as the default m7g.xlarge instances, but for free! (and if I use more than 650 hours per month, it will still be only 25% of the cost of the m7g.xlarge instances)

Amazing. Goes to show you it really pays to check what instances are appropriate for your type of workflow. For the same performance with the same workflow, I can pay $4/hour, $1/hour, or FREE (while the promotion lasts).

And that's only made possible by the Cloud and Coiled! So cool! 😎