Closed mwsutherland closed 3 years ago
Since I was still in a state where things were failing, I made a change to the code to subtract an hour from the last LastRecoveryPoint (the log backups are every 4 hours) and it worked. Trying again with the unmodified timestamp failed. Trying a 3rd time with the timestamp reduced by 5 minutes also worked.
So, I think I've found something here. Maybe the consistent failure of the 1st of a pair of restores was just a coincidence My sample size was only about 3 attempts on different nights.
I just looked at the LastRecoveryPoint of my failed run. It was an even second. If point in time is measured in fractions of a second and that is being rounded to the nearest second for the LastRecoveryPoint field, half of the time that value is going to be invalid.
I let my two-database restore script run tonight with a 5 minute reduction applied to LastRecoveryPoint. The first succeeded and the 2nd failed, which is the opposite of the previous attempts. So, I guess that kills any thought that the order matters and puts a significant hurt on my theory that the problem is rounding of LastRecoveryPoint.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @pvrk, @adityabalaji-msft.
Author: | mwsutherland |
---|---|
Assignees: | - |
Labels: | `Recovery Services Backup`, `Service Attention`, `question`, `customer-reported`, `needs-triage` |
Milestone: | - |
Tonight, I'm going to try a different work-around. I don't really need the latest log backup. The latest full will do. So, instead of using $BackupItem.LatestRecoveryPoint
as the point in time, I'm going to use (Get-AzRecoveryServicesBackupRecoveryPoint -Item $BackupItem)[0].RecoveryPointTime
.
@mwsutherland : First of all that apologies that you have to go through all this effort to figure out the error. We should do a better job at telling you the exact error message other than just "InvalidInput". By any chance, do you have any debug of the failure where you reduced 5 mins from the latest recovery point?
Also, when you are trying to restore distinct recovery points, the parameter in the config generation should be "recoveryPoint" and NOT "pointInTime". Hence, please use (Get-AzRecoveryServicesBackupRecoveryPoint -Item $BackupItem)[0]
as the input to -RecoveryPoint
and don't use point-in-time
@mwsutherland: Also I noticed that the JSON files for the final requests are identical for both outcomes. I mean the request which was successful which was made at 18:00 GMT is also the exact request made at 19:42 GMT which failed. I am surprised since as time passes, the last recovery point should be updated. Also if the exact same request is repeated at a later time, it should succeed.
Bottom line: Can you please provide debug of the latest failures after 5 min reduction to cross verify once more?
@pvrk Last night's restore run turned out the same as the night before. Afterwards, I realized that it wasn't a test of what I thought because the full backup was also the most recent backup. I was thinking that a log backup would have happened in between, but I was forgetting how things were scheduled.
The script running at night to do "real" restores doesn't have debugging information turned on, so I don't have the same output for that as when I do my manual test restores of a much smaller database. I'll add the line to output that and see what I get. If no better options present themselves before the end of the say, I'll use 5 minutes before last recovery point.
I would like to do a -RecoveryPoint
restore, but, unfortunately, that seems to be broken. I have a separate issue filed for that.
Unless I'm looking at the wrong JSON, I see two different point-in-times specified in the two requests. 2021-09-27T15:32:10Z (1632756730000)
for the first and 2021-09-27T19:34:08Z (1632771248000)
for the second.
@pvrk It turns out I had $DebugPreference='Continue'
set in the nightly job, but that part of the output wasn't being piped into the log file. Hopefully *>
will take care of that and I'll have more output for those restore attempts tomorrow, If you have any ideas of things I can try with my test database, I can do that during the day.
I didn't try any more experiments today, so all I have to add is the result of tonight's job run. Tonight, we got something new -- BOTH failed! This was using $PointInTime = ($BackupItem.LatestRecoveryPoint).AddMinutes(-5)
. I don't know if it is useful with both failing, but here is the debugging around the calls to Restore-AzRecoveryServicesBackupItem
:
============================ HTTP REQUEST ============================
HTTP Method:
POST
Absolute Uri:
https://management.azure.com/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/New-LRS-Recovery-Vault/p
roviders/Microsoft.RecoveryServices/vaults/New-LRS-Recovery-Vault/backupFabrics/Azure/protectionContainers/SQLAGWorkLoa
dContainer;8ecb2090-546c-4a8b-8f0a-c35f8b6f4339/protectedItems/SQLDataBase;dwag;edw/recoveryPoints/DefaultRangeRecovery
Point/restore?api-version=2021-03-01
Headers:
x-ms-client-request-id : 73ffd149-9868-45a1-bf67-bb3b4a47ecf6
accept-language : en-US
Body:
{
"properties": {
"objectType": "AzureWorkloadSQLPointInTimeRestoreRequest",
"pointInTime": "2021-09-29T20:59:38Z",
"shouldUseAlternateTargetLocation": true,
"isNonRecoverable": false,
"alternateDirectoryPaths": [
{
"mappingType": "Data",
"sourceLogicalName": "EDW",
"sourcePath": "R:\\EDW_Data\\EDW.mdf",
"targetPath": "E:\\SQLData\\EDW\\EDW.mdf"
},
{
"mappingType": "Log",
"sourceLogicalName": "EDW_log",
"sourcePath": "Z:\\EDW_Logs\\EDW_log.ldf",
"targetPath": "E:\\SQLData\\EDW\\EDW_log.ldf"
},
{
"mappingType": "Log",
"sourceLogicalName": "EDW2_Log",
"sourcePath": "K:\\SSISDB\\EDW2_Log.ldf",
"targetPath": "E:\\SQLData\\EDW\\EDW2_Log.ldf"
},
{
"mappingType": "Data",
"sourceLogicalName": "FG_Daily",
"sourcePath": "R:\\Data\\FG_Daily.ndf",
"targetPath": "E:\\SQLData\\EDW\\FG_Daily.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "FG_Month",
"sourcePath": "R:\\Data\\FG_Month.ndf",
"targetPath": "E:\\SQLData\\EDW\\FG_Month.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "FG_Yearly",
"sourcePath": "R:\\Data\\FG_Yearly.ndf",
"targetPath": "E:\\SQLData\\EDW\\FG_Yearly.ndf"
}
],
"recoveryType": "AlternateLocation",
"targetInfo": {
"overwriteOption": "Overwrite",
"containerId": "/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/New-LRS-Recovery-Vault/provide
rs/Microsoft.RecoveryServices/vaults/New-LRS-Recovery-Vault/backupFabrics/Azure/protectionContainers/vmappcontainer;com
pute;dwqa;dwqa",
"databaseName": "MSSQLSERVER/EDW"
},
"targetVirtualMachineId": "/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/DWQA/providers/Micros
oft.Compute/virtualMachines/DWQA"
}
}
============================ HTTP RESPONSE ============================
Status Code:
BadRequest
Headers:
Pragma : no-cache
X-Content-Type-Options : nosniff
x-ms-request-id : 31d69319-3681-4b43-b554-30fc2fb20dc6
x-ms-client-request-id : 73ffd149-9868-45a1-bf67-bb3b4a47ecf6,73ffd149-9868-45a1-bf67-bb3b4a47ecf6
Strict-Transport-Security : max-age=31536000; includeSubDomains
x-ms-ratelimit-remaining-subscription-writes: 1199
x-ms-correlation-request-id : 31d69319-3681-4b43-b554-30fc2fb20dc6
x-ms-routing-request-id : CENTRALUS:20210929T223029Z:31d69319-3681-4b43-b554-30fc2fb20dc6
Cache-Control : no-cache
Date : Wed, 29 Sep 2021 22:30:29 GMT
Server : Microsoft-IIS/10.0
X-Powered-By : ASP.NET
Body:
{
"error": {
"code": "BMSUserErrorInvalidInput",
"message": "Input provided for the call is invalid. Please check the required inputs",
"target": null,
"details": null,
"innerError": null
}
}
Caught exception, type: System.AggregateException
Handling aggregate exception
Received CloudException, ErrorCode: BMSUserErrorInvalidInput, Message: Input provided for the call is invalid. Please
check the required inputs
Restore-AzRecoveryServicesBackupItem : Input provided for the call is invalid. Please check the required inputs
At C:\scripts\RestoreDBFromVault.ps1:93 char:5
+ Restore-AzRecoveryServicesBackupItem -WLRecoveryConfig $AnotherIn ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Restore-AzRecoveryServicesBackupItem], CloudException
+ FullyQualifiedErrorId : BMSUserErrorInvalidInput,Microsoft.Azure.Commands.RecoveryServices.Backup.Cmdlets.Restor
eAzureRmRecoveryServicesBackupItem
AzureQoSEvent: Module: Az.RecoveryServices:4.4.0; CommandName: Restore-AzRecoveryServicesBackupItem; PSVersion:
5.1.14409.1005; IsSuccess: False; Duration: 00:00:01.1330865; Exception: Input provided for the call is invalid.
Please check the required inputs;
Finish sending metric.
6:30:30 PM - RestoreAzureRmRecoveryServicesBackupItem end processing.
and
============================ HTTP REQUEST ============================
HTTP Method:
POST
Absolute Uri:
https://management.azure.com/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/New-LRS-Recovery-Vault/p
roviders/Microsoft.RecoveryServices/vaults/New-LRS-Recovery-Vault/backupFabrics/Azure/protectionContainers/SQLAGWorkLoa
dContainer;8ecb2090-546c-4a8b-8f0a-c35f8b6f4339/protectedItems/SQLDataBase;dwag;gsdwstg/recoveryPoints/DefaultRangeReco
veryPoint/restore?api-version=2021-03-01
Headers:
x-ms-client-request-id : 9f05ec6a-07ea-4f2a-a938-384683b9dfe4
accept-language : en-US
Body:
{
"properties": {
"objectType": "AzureWorkloadSQLPointInTimeRestoreRequest",
"pointInTime": "2021-09-29T21:35:08Z",
"shouldUseAlternateTargetLocation": true,
"isNonRecoverable": false,
"alternateDirectoryPaths": [
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_MDATA_FILE",
"sourcePath": "R:\\detached_files\\Data\\DwStgMDatFil.mdf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgMDatFil.mdf"
},
{
"mappingType": "Log",
"sourceLogicalName": "DWSTG_LOG_FILE_01",
"sourcePath": "Z:\\Log\\GSDWSTG_log.ldf",
"targetPath": "E:\\SQLData\\GSDWSTG\\GSDWSTG_log.ldf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_DATA_FILE_01",
"sourcePath": "R:\\detached_files\\Data\\DwStgDatFil01.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgDatFil01.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_DATA_FILE_02",
"sourcePath": "R:\\detached_files\\Data\\DwStgDatFil02.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgDatFil02.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_DATA_FILE_03",
"sourcePath": "R:\\detached_files\\Data\\DwStgDatFil03.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgDatFil03.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_IDX_FILE_01",
"sourcePath": "R:\\detached_files\\Data\\DwStgIdxFil01.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgIdxFil01.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_IDX_FILE_02",
"sourcePath": "R:\\detached_files\\Data\\DwStgIdxFil02.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgIdxFil02.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_IDX_FILE_03",
"sourcePath": "R:\\detached_files\\Data\\DwStgIdxFil03.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgIdxFil03.ndf"
},
{
"mappingType": "Log",
"sourceLogicalName": "DWSTG_LOG_FILE_02",
"sourcePath": "Z:\\Log\\DWSTG_LOG_FILE_02.ldf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DWSTG_LOG_FILE_02.ldf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_LOG_FILE_04",
"sourcePath": "R:\\detached_files\\Data\\DwStgDatFil04.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DwStgDatFil04.ndf"
},
{
"mappingType": "Data",
"sourceLogicalName": "DWSTG_MDATA_FILE2",
"sourcePath": "Z:\\Log\\DWSTG_MDATA_FILE2.ndf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DWSTG_MDATA_FILE2.ndf"
},
{
"mappingType": "Log",
"sourceLogicalName": "DWSTG_LOG_FILE_03",
"sourcePath": "K:\\GSDWSTG_log\\DWSTG_LOG_FILE_03.ldf",
"targetPath": "E:\\SQLData\\GSDWSTG\\DWSTG_LOG_FILE_03.ldf"
}
],
"recoveryType": "AlternateLocation",
"targetInfo": {
"overwriteOption": "Overwrite",
"containerId": "/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/New-LRS-Recovery-Vault/provide
rs/Microsoft.RecoveryServices/vaults/New-LRS-Recovery-Vault/backupFabrics/Azure/protectionContainers/vmappcontainer;com
pute;dwqa;dwqa",
"databaseName": "MSSQLSERVER/GSDWSTG"
},
"targetVirtualMachineId": "/subscriptions/13fc5eb3-f9be-4fda-84db-d051c04f9d50/resourceGroups/DWQA/providers/Micros
oft.Compute/virtualMachines/DWQA"
}
}
============================ HTTP RESPONSE ============================
Status Code:
BadRequest
Headers:
Pragma : no-cache
X-Content-Type-Options : nosniff
x-ms-request-id : abaa36b3-059c-4aaf-8bfa-7b29b88cc8d3
x-ms-client-request-id : 9f05ec6a-07ea-4f2a-a938-384683b9dfe4,9f05ec6a-07ea-4f2a-a938-384683b9dfe4
Strict-Transport-Security : max-age=31536000; includeSubDomains
x-ms-ratelimit-remaining-subscription-writes: 1199
x-ms-correlation-request-id : abaa36b3-059c-4aaf-8bfa-7b29b88cc8d3
x-ms-routing-request-id : CENTRALUS:20210929T223038Z:abaa36b3-059c-4aaf-8bfa-7b29b88cc8d3
Cache-Control : no-cache
Date : Wed, 29 Sep 2021 22:30:38 GMT
Server : Microsoft-IIS/10.0
X-Powered-By : ASP.NET
Body:
{
"error": {
"code": "BMSUserErrorInvalidInput",
"message": "Input provided for the call is invalid. Please check the required inputs",
"target": null,
"details": null,
"innerError": null
}
}
Caught exception, type: System.AggregateException
Handling aggregate exception
Received CloudException, ErrorCode: BMSUserErrorInvalidInput, Message: Input provided for the call is invalid. Please
check the required inputs
Restore-AzRecoveryServicesBackupItem : Input provided for the call is invalid. Please check the required inputs
At C:\scripts\RestoreDBFromVault.ps1:93 char:5
+ Restore-AzRecoveryServicesBackupItem -WLRecoveryConfig $AnotherIn ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Restore-AzRecoveryServicesBackupItem], CloudException
+ FullyQualifiedErrorId : BMSUserErrorInvalidInput,Microsoft.Azure.Commands.RecoveryServices.Backup.Cmdlets.Restor
eAzureRmRecoveryServicesBackupItem
AzureQoSEvent: Module: Az.RecoveryServices:4.4.0; CommandName: Restore-AzRecoveryServicesBackupItem; PSVersion:
5.1.14409.1005; IsSuccess: False; Duration: 00:00:00.7282681; Exception: Input provided for the call is invalid.
Please check the required inputs;
Finish sending metric.
6:30:39 PM - RestoreAzureRmRecoveryServicesBackupItem end processing.
@mwsutherland : Thanks for the debug information. We will start an internal investigation with this. Also will tackle both issues separately and provide clarity on entire restore process for SQL using Powershell. Thanks.
@mwsutherland : Meanwhile can you check with Get-AzRecoveryServicesBackupRecoveryLogChain
command and see if $BackupItem.LatestRecoveryPoint is within the range of the valid log chain?
Very interesting. At the moment, the RecoveryPointTime of the last recovery point is Friday, October 1, 2021 7:34:07 PM the backup Item LatestRecoveryPoint is also Friday, October 1, 2021 7:34:07 PM but the Log chain End time is Friday, October 1, 2021 7:32:11 PM
At this point, I'm sure you will not be surprised that trying to restore at the PIT of 7:34:07 fails while restoring to a PIT of 7:32:11 works.
Does this mean that the last log backup went until 7:32:11 and the following full backup happened at 7:34:07? And that the restore with point in time is ignoring the new full backup?
The solution has been found to my RecoveryPoint restore issue (I was including an unnecessary parameter, causing things not to work), so now I am able to restore to a recovery point. It works in my test case at least at the moment. I'll update my evening job and see how they like it.
Both jobs started tonight!
@mwsutherland : Please keep us posted and let us know if you could restore to both 1) Recovery Point 2) Log Point-In-Time
When I found the differing times times above (recovery points being after log chain end time), the point-in-time restore to the recovery times failed while the restore to the end of long chain worked. My recovery point restores are working now that the problem with creating a proper config is resolved.
Adding a mention about how to get the valid range of times to the documentation about the -PointInTime parameter of Get-AzRecoveryServicesBackupWorkloadRecoveryConfig could be very useful to future users. An error message indicating that the pointInTime value is out of range would be awesome.
@mwsutherland : Apologies for the delay in response and thanks for the feedback. We have 2 actions items from this investigation: 1) Clearly mention in the documentation about how to get log point-in-time restores and the required combinations - This is taken up and will be reflected by Oct 19th. In the long run, we will try to see if we can throw an error (by using parameter sets or just throw an error) 2) Throw a relevant error message (that given point-in-time is beyond log chain) and this is added to the backlog. I don't have definite timelines yet. I will update this bug when I have them.
Closing this issue. Please re-open if these actions are not enough.
Description
Sometimes Restore-AzRecoveryServicesBackupItem fails for no obvious reason. I've had the same database sometimes fail and sometimes succeed. My intended use case has me trying to restore two databases. The first fails and the second succeeds. If I swap the order, the new first fails and the new second succeeds. In testing with a single database, I find that it will consistently either fail or succeed for quite a while. I started working on this ticket on a Friday. Everything was failing as I put together my demonstration script. The work day was over and when I ran it on Monday to gather output, it worked. The good news is that, after waiting for a few hours, it failed again, so now I have output for both cases. Comparing the output, the only thing I notice different before the error, other than time stamps and correlation IDs, is that another log backup had happened between the two attempts, so they aren't actually targeting the same point-in-time. I don't know why that should matter unless the LatestRecoveryPoint field of the backup I'm trying to restore isn't always valid point-in-time to use. That doesn't seem to explain my scenario where I swap the order of restores, but I'll investigate this angle further and add any new findings in the comments.
Steps to reproduce
Environment data
Module versions
Debug output
Output when it works from the point where I output the contents of the $Confg variable:
Output when it fails:
Error output