Azure / azure-powershell

Microsoft Azure PowerShell
Other
4.25k stars 3.85k forks source link

[Feature]: [Az.Migrate] Adding new cmdlet for migration execution monitoring details #21775

Closed singhabh27 closed 1 year ago

singhabh27 commented 1 year ago

Description of the new feature

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> This command would list all operations related to server migrations occurring in the project with the information shown below. There can be multiple primary appliances associated with a migrate project for VMware agentless scenario. This commandlet will list the details of all the servers across all the VMware agentless appliances.

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -Appliance <String> This command will list all operations occurring on the mentioned primary appliance and its associated scale-out appliances.

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String> This command will list all operations of a particular server mentioned.

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String> -Health This command will list all operations of a particular server mentioned along with health issues.

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String> -Expedite This command will give details of appliance operating parameters and give a list of steps customers can take to prioritize the migration operation of the given server to reduce time remaining.

Proposed implementation details (optional)

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String>

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Primary Appliance | Server | Operation | Progress | Time elapsed | Estimated Time remaining | Current Upload Speed | Health | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 5% | 5 minutes | 10 minutes | 5 Mbps | Warning | H1 | D1, D2 A1 | S4 | Shutdown |   | 5 minutes | Not known | Not applicable | Healthy | H2 | D3 A2 | S5 | Final DR Queued | Not applicable | 10 minutes | Not known | Not applicable | Error | H1 | D1, D2 A2 | S2 | Final DR | 20% | 15 minutes | 45 minutes | 8 Mbps | Healthy | H1 | D2, D3

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -Appliance <String>

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Server | Operation | Progress | Time elapsed | Estimated Time remaining | Upload Speed | Health | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- S3 | IR | 5% | 5 minutes | 10 minutes | 5 Mbps | Warning | H1 | D1, D2 S5 | Final DR Queued | Not applicable | 10 minutes | Not known | Not applicable | Error | H1 | D1, D2 S2 | Final DR | 20% | 15 minutes | 45 minutes | 8 Mbps | Healthy | H1 | D2, D3

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String>

Server Information:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2

Disk Level Operation Status:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Operation | Progress | Time remaining | Upload Speed | Datastore -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | 10 Mbps | D1 S3_Disk_2 IR | 92% | 10 mins | 20 Mbps | D2 S3_Disk_3 CR | 90% | 5 mins | 25 Mbps | D2 S3_Disk_4 DR | 100% | 0 minutes | 0 Mbps | D1

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String> -Health

Server Information: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2

Disk Level Operation Status: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Operation | Progress | Time remaining | Upload Speed | Datastore -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | 10 Mbps | D1 S3_Disk_2 IR | 92% | 10 mins | 20 Mbps | D2 S3_Disk_3 CR | 90% | 5 mins | 25 Mbps | D2 S3_Disk_4 DR | 100% | 0 minutes | 0 Mbps | D1

List of warning or critical errors for this server:

Encountered timeout event 'DisposeArtefactsTimeout' in the state '['Gateway.Service.StateMachine.SnapshotReplication.SnapshotReplicationEngine+WaitingForArtefactsDisposalPreCycle' ('WaitingForArtefactsDisposalPreCycle)]'

Get-AzMigrateServerMigrationStatus -ProjectName <String> -ResourceGroupName <String> -MachineName <String> -Expedite

Server information: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2

 

Disk Level Operation status: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Operation | Progress | Time remaining | Datastore/Storage Provider | Upload Speed | VMWare Read Throughput -- | -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | D1 | 10 Mbps | 10 Mbps S3_Disk_2 IR | 92% | 10 mins | D2 | 20 Mbps | 20 Mbps S3_Disk_3 CR | 90% | 5 mins | D2 | 25 Mbps | 25 Mbps S3_Disk_4 DR | 100% | 0 minutes | D1 | 0 Mbps | 0 Mbps

Resource utilization information for migration operations: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Resource | Capacity | Utilization for server migrations | Total Utilization | Status -- | -- | -- | -- | -- Appliance RAM (Aggregation across all appliances for a primary appliance) | Total capacity | Perfmon for process | Perfmon for host | At capacity/ Underutilized/ Throttled Appliance CPU (Aggregation across all appliances for a primary appliance) | Total cores | % CPU utilization for process | % CPU utilization for host | At capacity/ Underutilized/ Throttled Network bandwidth (Aggregation across all appliances for a primary appliance) | Total bandwidth (As observed by Azcopy) | (Perfmon or aggregation of upload status across VMs using this primary appliance) | Perfmon | At capacity/ Underutilized/ Throttled ESXi host NFC buffer | Total assumed capacity | Used capacity | NA | At capacity/ Underutilized/ Throttled Parallel Disks Replicated (Aggregation across all appliances for a primary appliance) | Total workers across all appliances for the primary appliance | Used workers across all appliances for the primary appliance | NA | At capacity/ Underutilized/ Throttled Datastore Snapshot Count (for each datastore corresponding to the server’s disks) | Total assumed capacity | Used capacity | NA | At capacity/ Underutilized/ Throttled

List of actions (for servers with ongoing replication cycles) (with documentation links):

  1. If RAM/CPU is throttled – Cycle reduction for primary appliance A2/increase RAM/CPU for appliance/configure scaleout appliances/reduce replication workers. (Cancel migration currently only through jobs, we may want to add it to the VM context menu.)
  2. Network bandwidth is throttled - Increase the Network bandwidth available for appliances so that upload speeds can increase/ Cycle reduction for primary appliance A2/reduce replication workers.
  3. If SnapshotRead errors – ESX host cycles reduction/ESX host NFC buffer increase/compute v-motion(if customer is ready to fail current cycle of the VM they do the v-motion for).
  4. If SnapshotCreation Error – SnapshotCount increase/Cycle reduction for datastore.
  5. If SnapshotRead errors with access issues – turn off other backup software.
  6. Managed disk throttling errors – Increase IOPs limit by resizing disk.
  7. Appliance Heartbeat missing for any registered appliance – turn on appliance/fix auth/network issues.
ghost commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @prsadhu-ms-idc.

Issue Details
### Description of the new feature **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*>** This command would list all operations related to server migrations occurring in the project with the information shown below. There can be multiple primary appliances associated with a migrate project for VMware agentless scenario. This commandlet will list the details of all the servers across all the VMware agentless appliances. **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -Appliance <*String*>** This command will list all operations occurring on the mentioned primary appliance and its associated scale-out appliances. **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*>** This command will list all operations of a particular server mentioned. **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*> -Health** This command will list all operations of a particular server mentioned along with health issues. **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*> -Expedite** This command will give details of appliance operating parameters and give a list of steps customers can take to prioritize the migration operation of the given server to reduce time remaining. ### Proposed implementation details (optional) # **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*>**
Primary Appliance | Server | Operation | Progress | Time elapsed | Estimated Time remaining | Current Upload Speed | Health | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 5% | 5 minutes | 10 minutes | 5 Mbps | Warning | H1 | D1, D2 A1 | S4 | Shutdown |   | 5 minutes | Not known | Not applicable | Healthy | H2 | D3 A2 | S5 | Final DR Queued | Not applicable | 10 minutes | Not known | Not applicable | Error | H1 | D1, D2 A2 | S2 | Final DR | 20% | 15 minutes | 45 minutes | 8 Mbps | Healthy | H1 | D2, D3
# **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -Appliance <*String*>**
Server | Operation | Progress | Time elapsed | Estimated Time remaining | Upload Speed | Health | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- S3 | IR | 5% | 5 minutes | 10 minutes | 5 Mbps | Warning | H1 | D1, D2 S5 | Final DR Queued | Not applicable | 10 minutes | Not known | Not applicable | Error | H1 | D1, D2 S2 | Final DR | 20% | 15 minutes | 45 minutes | 8 Mbps | Healthy | H1 | D2, D3
# **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*>** **Server Information:**
Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2
**Disk Level Operation Status:**
Operation | Progress | Time remaining | Upload Speed | Datastore -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | 10 Mbps | D1 S3_Disk_2 IR | 92% | 10 mins | 20 Mbps | D2 S3_Disk_3 CR | 90% | 5 mins | 25 Mbps | D2 S3_Disk_4 DR | 100% | 0 minutes | 0 Mbps | D1
# **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*> -Health** **Server Information:**
Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2
**Disk Level Operation Status:**
Operation | Progress | Time remaining | Upload Speed | Datastore -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | 10 Mbps | D1 S3_Disk_2 IR | 92% | 10 mins | 20 Mbps | D2 S3_Disk_3 CR | 90% | 5 mins | 25 Mbps | D2 S3_Disk_4 DR | 100% | 0 minutes | 0 Mbps | D1
**List of warning or critical errors for this server:** Encountered timeout event 'DisposeArtefactsTimeout' in the state '['Gateway.Service.StateMachine.SnapshotReplication.SnapshotReplicationEngine+WaitingForArtefactsDisposalPreCycle' ('WaitingForArtefactsDisposalPreCycle)]' # **Get-AzMigrateServerMigrationStatus -ProjectName <*String*> -ResourceGroupName <*String*> -MachineName <*String*> -Expedite** **Server information:**
Appliance | Server | Operation | Progress | Time elapsed | Time remaining | Upload Speed | ESXi Host/Compute Provider | Datastore/Storage Provider -- | -- | -- | -- | -- | -- | -- | -- | -- A2 | S3 | IR | 80% | 2 hours | 45 mins | 13 Mbps | H1 | D1, D2

 

**Disk Level Operation status:**
Operation | Progress | Time remaining | Datastore/Storage Provider | Upload Speed | VMWare Read Throughput -- | -- | -- | -- | -- | -- S3_Disk_1 IR | 80% | 45 mins | D1 | 10 Mbps | 10 Mbps S3_Disk_2 IR | 92% | 10 mins | D2 | 20 Mbps | 20 Mbps S3_Disk_3 CR | 90% | 5 mins | D2 | 25 Mbps | 25 Mbps S3_Disk_4 DR | 100% | 0 minutes | D1 | 0 Mbps | 0 Mbps
**Resource utilization information for migration operations:**
Resource | Capacity | Utilization for server migrations | Total Utilization | Status -- | -- | -- | -- | -- Appliance RAM (Aggregation across all appliances for a primary appliance) | Total capacity | Perfmon for process | Perfmon for host | At capacity/ Underutilized/ Throttled Appliance CPU (Aggregation across all appliances for a primary appliance) | Total cores | % CPU utilization for process | % CPU utilization for host | At capacity/ Underutilized/ Throttled Network bandwidth (Aggregation across all appliances for a primary appliance) | Total bandwidth (As observed by Azcopy) | (Perfmon or aggregation of upload status across VMs using this primary appliance) | Perfmon | At capacity/ Underutilized/ Throttled ESXi host NFC buffer | Total assumed capacity | Used capacity | NA | At capacity/ Underutilized/ Throttled Parallel Disks Replicated (Aggregation across all appliances for a primary appliance) | Total workers across all appliances for the primary appliance | Used workers across all appliances for the primary appliance | NA | At capacity/ Underutilized/ Throttled Datastore Snapshot Count (for each datastore corresponding to the server’s disks) | Total assumed capacity | Used capacity | NA | At capacity/ Underutilized/ Throttled
**List of actions (for servers with ongoing replication cycles) (with documentation links):** 1. If RAM/CPU is throttled – Cycle reduction for primary appliance A2/increase RAM/CPU for appliance/configure scaleout appliances/reduce replication workers. (Cancel migration currently only through jobs, we may want to add it to the VM context menu.) 2. Network bandwidth is throttled - Increase the Network bandwidth available for appliances so that upload speeds can increase/ Cycle reduction for primary appliance A2/reduce replication workers. 3. If SnapshotRead errors – ESX host cycles reduction/ESX host NFC buffer increase/compute v-motion(if customer is ready to fail current cycle of the VM they do the v-motion for). 4. If SnapshotCreation Error – SnapshotCount increase/Cycle reduction for datastore. 5. If SnapshotRead errors with access issues – turn off other backup software. 6. Managed disk throttling errors – Increase IOPs limit by resizing disk. 7. Appliance Heartbeat missing for any registered appliance – turn on appliance/fix auth/network issues.
Author: singhabh27
Assignees: -
Labels: `feature-request`, `Migrate`, `Service Attention`
Milestone: -
vidai-msft commented 1 year ago

Design review is not necessary now.