Open andll opened 1 year ago
That is quite strange indeed. We spawn one task per instance but somehow SSH commands remain sequential. I suspect this is an issue of the underlying SSH library we use and it should go away when we remove it to instead use the SSH bin on our local machines.
So I am mostly referring to the calls to EC2, for example when stopping instances we issue one query per region sequentially, I think there are other places like this as well
We seem to issue requests to AWS sequentially, and it actually takes a bit of time even on 10 node cluster, probably going to be unusable on 100 nodes. We should try instead to parallelize things as much as possible, e.g. every time we have some kind of array of instances we should issue AWS requests to those instances in parallel