GoogleCloudPlatform / batch-samples

56 stars 26 forks source link

Networking between parallel instances? #33

Closed vsoch closed 1 year ago

vsoch commented 1 year ago

Hiya! We were wanting to use a tool that will assess networking between instances. Is this possible, or when we set parallelism it is running the same thing in parallel N times (but the nodes are not connected?) Thanks!

vsoch commented 1 year ago

We figured this out - and it resulted from a point of confusion with the term "task." In HPC a task typically refers to an MPI process. The Batch API uses task to describe a scoped piece of work. So it could be the case that we want a 1:1 correspondence, meaning one machine on Compute Engine with 2 vCPU and 1 core (to run one task) but it could also be the case we want a larger machine that might have, say, 8 vCPU and 4 cores, in which case this is still one Google Batch task, but it's actually 4 tasks for our tool (MPI tasks). We were able to create separate variables to describe the two, specifically:

A suggestion - if you are expecting HPC users to use Google Batch, I would come up for a different name and not use task. It means something very different for a lot of us. Thanks!