devitocodes / Azure-devito

Setup for MPI strong scaling and general runs on Azure
MIT License
0 stars 1 forks source link

Resize timeouts for 64 VMs at H-series #5

Open georgebisbas opened 4 years ago

georgebisbas commented 4 years ago

Waiting time for >32 VMs at H-series is very slow. Can we speed-up things?

FabioLuporini commented 4 years ago

yeah... especially, can we get access to it more quickly ?

JonShelley commented 4 years ago

How are you creating these VMs? What are the actual wait times?

georgebisbas commented 4 years ago

Hi @JonShelley, these VMs are created using the configuration specified in https://github.com/devitocodes/Azure-devito/blob/master/shipyard-config/pool.yaml. We were always able to get 32 VMs fired-up in less than 15 minutes. However, scaling to 64 VMs, we couldn't acquire the resources needed. I think the highest amount of time we kept waiting was around 1 hour and 10 mins, using the --wait option in order to avoid resize timeouts.

georgebisbas commented 4 years ago

Hi @JonShelley, are there any news concerning this issue? Thank you in advance.

FabioLuporini commented 4 years ago

any updates on this?

JonShelley commented 4 years ago

None at this time. I asked our capacity planner and he indicated that we had plenty of resources. Are you still seeing this problem? If so, which region are you seeing this issue in? Also, I’ll bring this up on Thursday when I talk to the batch team.

Jon On Mon, Feb 24, 2020 at 11:14 PM Fabio Luporini notifications@github.com wrote:

any updates on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/devitocodes/devito/issues/1104?email_source=notifications&email_token=ABGELIF557FEIED4JDXI3U3RETAOHA5CNFSM4KNGCOKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM22SJI#issuecomment-590719269, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGELIBZOK5YRQS6CG35RVLRETAOHANCNFSM4KNGCOKA .

georgebisbas commented 4 years ago

Hi Jon,

thanks for taking care on that. The last 2 weeks we did not really scale up to 64 VMs, as our efforts were focused on compilation and runtime requirements for a POC that could use less VMs (4 or 8). As soon as we ensure that our requirements are satisfied and our POC is robust we will update again for scaling up to a larger number of VMs.