alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
61 stars 15 forks source link

September 2019 DSG Compute VM deployment log #456

Closed jemrobinson closed 4 years ago

jemrobinson commented 5 years ago

Issue to log deployment records for compute VM deploys to September 2019 DSGs.

:warning: All deployments should be from the DSG-Turing-SEP2019 branch

Initially, compute VMs deployed for the September 2019 DSG will use version 0.1.2019071900 of the Ubuntu VM image. See below for any changes to the compute VM image made subsequent to this:

Version Commit Pull request Differences wrt. previous
0.1.2019071900 d443faf - Added libreoffice and some LwM python packages

DSSG environments

Subscription Tier Name
DSG22 ? Telenor
DSG23 ? Turkcell
DSG24 ? Telus
DSG25 ? STC

Compute VM deployment instructions

Adapted from section 7 of the DSG build instructions (also known as the "runbook"). The deployment branch has been updated to the DSG-Turing-SEP2019 branch.

Connect to a deployment VM

Pull the latest September 2019 DSG code and deploy a compute VM

VM size will default to Standard_DS2_v2 and VMs deployed prior to the DSG should use this size. However, we should resize all compute VMs to Standard_D32s_v3 on the Monday of the DSG and use this size for additional general purpose shared VM deployments during the DSG (e.g. to install software not available via the Python-PyPI and R-CRAN package mirrors)

Record the deployment log in this issue

Logout of Azure in Powershell and Azure CLI

Copying DSG challenge data

The compute VMs do not have direct access to the DSG network file share. Therefore we need to copy the DSG challenge data and other shared data / code to each new VM that is deployed. To do this:

Useful size options

Click to expand VM sizing tables ### Dsv3-series (General compute - CPU only) Limit is 350 vCPUs total per DSG ACU: 160-190 | Premium Storage: Supported | Premium Storage Caching: Supported DSv3-series sizes are based on the 2.4 GHz Intel Xeon E5-2673 v3 (Haswell) processor or the latest 2.3 GHz Intel XEON E5-2673 v4 (Broadwell) processor that can achieve 3.5GHz with Intel Turbo Boost Technology 2.0 and use premium storage. The Dsv3-series sizes offer a combination of vCPU, memory, and temporary storage for most production workloads. Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | Max data disks | Max cached and temp storage throughput: IOPS / MBps (cache size in GiB) | Max uncached disk throughput: IOPS / MBps | Max NICs / Expected network bandwidth (Mbps) -- | -- | -- | -- | -- | -- | -- | -- Standard_D2s_v3 | 2 | 8 | 16 | 4 | 4,000 / 32 (50) | 3,200 / 48 | 2 / 1,000 Standard_D4s_v3 | 4 | 16 | 32 | 8 | 8,000 / 64 (100) | 6,400 / 96 | 2 / 2,000 Standard_D8s_v3 | 8 | 32 | 64 | 16 | 16,000 / 128 (200) | 12,800 / 192 | 4 / 4,000 Standard_D16s_v3 | 16 | 64 | 128 | 32 | 32,000 / 256 (400) | 25,600 / 384 | 8 / 8,000 Standard_D32s_v3 | 32 | 128 | 256 | 32 | 64,000 / 512 (800) | 51,200 / 768 | 8 / 16,000 Standard_D64s_v3 | 64 | 256 | 512 | 32 | 128,000 / 1024 (1600) | 80,000 / 1200 ### NC-series (Tesla K80 GPUs) Limit is 48 vCPUs total per DSG (i.e. 8 GPUs - max 4 per VM) Premium Storage: Not Supported | Premium Storage Caching: Not Supported NC-series VMs are powered by the NVIDIA Tesla K80 card. Users can crunch through data faster by leveraging CUDA for energy exploration applications, crash simulations, ray traced rendering, deep learning, and more. The NC24r configuration provides a low latency, high-throughput network interface optimized for tightly coupled parallel computing workloads. Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | GPU | GPU memory: GiB | Max data disks | Max NICs -- | -- | -- | -- | -- | -- | -- | -- Standard_NC6 | 6 | 56 | 340 | 1 | 8 | 24 | 1 Standard_NC12 | 12 | 112 | 680 | 2 | 16 | 48 | 2 Standard_NC24 | 24 | 224 | 1440 | 4 | 32 | 64 | 4 Standard_NC24r* | 24 | 224 | 1440 | 4 | 32 | 64 | 4 ### E2s_v3-series (Memory optiimised - CPU only) Limit is 350 vCPUs total per DSG ACU: 160-190 | Premium Storage: Supported | Premium Storage Caching: Supported Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | Max data disks | Max cached and temp storage throughput: IOPS / MBps (cache size in GiB) | Max uncached disk throughput: IOPS / MBps | Max NICs / Expected network bandwidth (Mbps) -- | -- | -- | -- | -- | -- | -- | -- Standard_E2s_v3 | 2 | 16 | 32 | 4 | 4,000 / 32 (50) | 3,200 / 48 | 2 / 1,000 Standard_E4s_v3 2 | 4 | 32 | 64 | 8 | 8,000 / 64 (100) | 6,400 / 96 | 2 / 2,000 Standard_E8s_v3 2 | 8 | 64 | 128 | 16 | 16,000 / 128 (200) | 12,800 / 192 | 4 / 4,000 Standard_E16s_v3 2 | 16 | 128 | 256 | 32 | 32,000 / 256 (400) | 25,600 / 384 | 8 / 8,000 Standard_E20s_v3 | 20 | 160 | 320 | 32 | 40,000 / 320 (400) | 32,000 / 480 | 8 / 10,000 Standard_E32s_v3 2 | 32 | 256 | 512 | 32 | 64,000 / 512 (800) | 51,200 / 768 | 8 / 16,000 Standard_E64s_v3 2 | 64 | 432 | 864 | 32 | 128,000/1024 (1600) | 80,000 / 1200 ### Fsv2-series (Compute optimised - CPU only) Limit is 350 vCPUs total per DSG ACU: 195 - 210 | Premium Storage: Supported | Premium Storage Caching: Supported Size | vCPU's | Memory: GiB | Temp storage (SSD) GiB | Max data disks | Max cached and temp storage throughput: IOPS / MBps (cache size in GiB) | Max uncached disk throughput: IOPS / MBps | Max NICs / Expected network bandwidth (Mbps) -- | -- | -- | -- | -- | -- | -- | -- Standard_F2s_v2 | 2 | 4 | 16 | 4 | 4000 / 31 (32) | 3200 / 47 | 2 / 875 Standard_F4s_v2 | 4 | 8 | 32 | 8 | 8000 / 63 (64) | 6400 / 95 | 2 / 1,750 Standard_F8s_v2 | 8 | 16 | 64 | 16 | 16000 / 127 (128) | 12800 / 190 | 4 / 3,500 Standard_F16s_v2 | 16 | 32 | 128 | 32 | 32000 / 255 (256) | 25600 / 380 | 4 / 7,000 Standard_F32s_v2 | 32 | 64 | 256 | 32 | 64000 / 512 (512) | 51200 / 750 | 8 / 14,000 Standard_F64s_v2 | 64 | 128 | 512 | 32 | 128000 / 1024 (1024) | 80000 / 1100 | 8 / 28,000
martintoreilly commented 4 years ago

Closing as September DSG is over