dsccommunity / SqlServerDsc

This module contains DSC resources for deployment and configuration of Microsoft SQL Server.
MIT License
361 stars 225 forks source link

SqlAlwaysOnService/SqlAGReplica: Failed to connect to SQL instance #1220

Open michaelsync opened 6 years ago

michaelsync commented 6 years ago

Details of the scenario you tried and the problem that is occurring

I am getting this error when I enable the AlwaysOn using this DSC 'SqlAlwaysOnService'. I can manually enable it from the configuration without any issue.

Updated: I noticed that I got the same issue with AddReplica. (but SqlAG is working. )

Verbose logs showing the problem

"Message":  "PowerShell DSC resource MSFT_SqlAlwaysOnService  failed to execute Test-TargetResource functionality with error message: System.InvalidOperationException: Failed to connect to SQL instance \u0027MY-SQL-01\u0027. ",
 "SerializedRemoteException":  "System.Exception: System.InvalidOperationException: Failed to connect to SQL instance \u0027MY-SQL-01\u0027.",
 "SerializedRemoteInvocationInfo":  "System.Management.Automation.InvocationInfo",
 "ErrorRecord":  "System.InvalidOperationException: Failed to connect to SQL instance \u0027MY-SQL-01\u0027.",
 "WasThrownFromThrowStatement":  false,
 "Message":  "System.InvalidOperationException: Failed to connect to SQL instance \u0027MY-SQL-01\u0027.",
 "Data":  "System.Collections.ListDictionaryInternal",
 "InnerException":  null,
 "TargetSite":  "Void EndInvoke()",
 "StackTrace":  "   at System.Management.Automation.Runspaces.AsyncResult.EndInvoke()\r\n   at System.Management.Automation.PowerShell.CoreInvokeRemoteHelper[TInput,TOutput](PSDataCollection`1 input, PSDataCollection`1 output, PSInvocationSettings settings)\r\n   at System.Management.Automation.PowerShell.CoreInvoke[TInput,TOutput](PSDataCollection`1 input, PSDataCollection`1 output, PSInvocationSettings settings)\r\n   at System.Management.Automation.PowerShell.Invoke(IEnumerable input, PSInvocationSettings settings)\r\n   at Microsoft.PowerShell.DesiredStateConfiguration.Internal.ResourceProviderAdapter.ExecuteCommand(PowerShell powerShell, ResourceModuleInfo resInfo, String operationCmd, List`1 acceptedProperties, CimInstance nonResourcePropeties, CimInstance resourceConfiguration, LCMDebugMode debugMode, PSInvocationSettings pSInvocationSettings, UInt32\u0026 resultStatusHandle, Collection`1\u0026 result, ErrorRecord\u0026 errorRecord, PSModuleInfo localRunSpaceModuleInfo)",

Suggested solution to the issue

No idea. but I tested enabling the 'AlwaysOn' manually and it works so something wrong with my code or this DSC.

SQL Server edition and version the target node is running

Microsoft SQL Server 2016 (SP1-CU8) (KB4077064) - 13.0.4474.0 (X64) Feb 24 2018 13:53:17 Copyright (c) Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2016 Datacenter 10.0 (Build 14393: ) (Hypervisor)

SQL Server PowerShell modules present on the target node

Name Version Path


SqlServer 21.0.17279 C:\Program Files\WindowsPowerShell\Modules\SqlServer\21.0.17279\SqlServer.psd1 SQLPS 1.0 C:\Program Files (x86)\Microsoft SQL Server\130\Tools\PowerShell\Modules\SQLPS\SQLPS.psd1

The DSC configuration that is used to reproduce the issue (as detailed as possible)

SqlAlwaysOnService EnableHADR
        {
            Ensure               = 'Present'
            InstanceName         = $Node.InstanceName
            ServerName           = $Node.NodeName
            DependsOn            = '[SqlServerEndpoint]SQLConfigureEndpoint-Instance'
        }
SqlAGReplica AddReplica
            {
                Ensure                     = 'Present'
                Name                       = $Node.NodeName
                AvailabilityGroupName      = $Node.AvailabilityGroupName
                ServerName                 = $Node.NodeName
                InstanceName               = "$($Node.InstanceName):1433"
                PrimaryReplicaServerName   = ( $AllNodes | Where-Object { $_.Role -eq 'MS SQL Primay Replica' } ).NodeName
                PrimaryReplicaInstanceName = ( $AllNodes | Where-Object { $_.Role -eq 'MS SQL Primay Replica' } ).InstanceName
                DependsOn                  = '[SqlAlwaysOnService]EnableHADR'
                ProcessOnlyOnActiveNode    = $Node.ProcessOnlyOnActiveNode
            }

The operating system the target node is running

OsName : Microsoft Windows Server 2016 Datacenter OsOperatingSystemSKU : DatacenterServerEdition OsArchitecture : 64-bit WindowsBuildLabEx : 14393.2248.amd64fre.rs1_release.180427-1804 OsLanguage : en-US OsMuiLanguages : {en-US}

Version and build of PowerShell the target node is running

Name Value


PSVersion 5.1.14393.2248 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.14393.2248 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1

Version of the DSC module that was used ('dev' if using current dev branch)

johlju commented 6 years ago

Can you provide the configuration for the SqlAlwaysOnService resource? Can you provide the verbose logs that also shows when you get this error for the SqlAlwaysOnService? Let's focus on that resource for now.

I see you have "$($Node.InstanceName):1433", since you are using port 1433 I'm assuming you using the default instance 'MSSQLSERVER'? Does it work if you change it to $Node.InstanceName, without the port?

michaelsync commented 6 years ago

Can you provide the configuration for the SqlAlwaysOnService resource?

Sorry. I pasted it wrong. I have updated my post.

Here is the code.

SqlAlwaysOnService EnableHADR
        {
            Ensure               = 'Present'
            InstanceName         = $Node.InstanceName
            ServerName           = $Node.NodeName
            DependsOn            = '[SqlServerEndpoint]SQLConfigureEndpoint-Instance'
        }

I wasn't able to enable the AlwaysOn using this config but when I run it again after enabling it manually, SqlAlwaysOnService is working now.

But I am still having issue with SqlAGReplica. I tried it with port and without port. Both of them didn't work.

michaelsync commented 6 years ago

This log is all have I in Azure Automation Account.

johlju commented 6 years ago

Not sure it possible to see the verbose output from the resources in Azure Automation. Can you please run the configuration manually on the target node using Start-DscConfiguration -Path <path to mof> -Wait -Verbose?

johlju commented 6 years ago

There is a 'View raw report' in the report blade, on a node in Azure Automation (where you see the list with status of each run). CLick on a report in the list, and you should see 'View raw report'. I'm told it should contain the verbose logs.

michaelsync commented 6 years ago

Thanks @johlju

SQL Primary Server - SqlAlwaysOnService

Here is the log for SqlAlwaysOnService - SQL (Note: I have changed the node name, IP, config file name, account id. ) As I mentioned in my previous comment, it works now after enabling it manually. But i guess it will fail if I disable it again and enable it from DSC.

SQL Primary - SqlAlwaysOnService - 2018-09-13T03_35_38.914Z.txt

I still have the problem creating AG and adding the replicas. Let me know if you need the log for them as well.

johlju commented 6 years ago

That did not contain the verbose logs :/ Then you need to run the configuration manually as mentioned in one of my previous comment. It’s a lot easier to see what the resource does or try to do in the verbose logs. It will shows any errors that are encounter since the SqlAlwaysOnService is working (unless you want to disable it again when running the configuration so it fails) then we can focus on one resource that does not work. Although it would be awesome if I could get a similar environment as you have (see my comment in the xFailOverCluster issue)

johlju commented 6 years ago

Sorry, closed the issue by mistake, reopened now.

Are you getting an error with SqlAG as well? Love to see the verbose logs. It will shows everything much more clearly. 😊

michaelsync commented 6 years ago

@johlju Thanks. Yes. I am still having SqlAG . I will see if I can run it manually.

The reason that I didn't run is that I don't know some parameters (password or user account. ) that we passed via VSTS. We are also using the credential from azure automation account. I don't know how to tell the Start-DscConfiguration to connect the automation account.

Can you give me an example or link how to tell the PowerShell to use the automation account when we run DSC?

johlju commented 6 years ago

Don't think it is possible to get the automation account credential object locally. Anyone, correct me if I'm wrong.

You probably need to take the configuration locally, and compile it using temporary accounts (that you know the user and password for) with the same permission. Then run Start-DscConfiguration -Path <path to mof> -Wait -Verbose, where path is the folder where the compiled mof file is located.

goforebroke commented 6 years ago

@johlju

I have the exact same issue.. virtually the same code... The configuration fails when trying to add the secondary replica to the AG....

{"time": "2018-10-02T15:20:30.270-7:00", "type": "error", "message": "PowerShell DSC resource MSFT_SqlAGReplica failed to execute Set-TargetResource functionality with error message: System.InvalidOperationException: Failed to connect to SQL instance 'SQLBOXA'. "}, {"time": "2018-10-02T15:20:30.323-7:00", "type": "verbose", "message": "[SQLBOXB]: LCM: [ End Set ] "}

info attached

dsc_verbose.txt

johlju commented 6 years ago

@goforebroke would help a lot to get the verbose output from the Start-DscConfiguration -Path <path to mof> -Wait -Verbose.

goforebroke commented 6 years ago

@johlju

attached is the verbose output dsc.txt

johlju commented 6 years ago

@goforebroke interesting that the error output is during restart. Did not help what call that actually fails. Is it possible for you to narrow down what call in Set-TargetResource actually generating the connection failed problem?

goforebroke commented 6 years ago

@johlju

After some testing and debugging.. It appears the errors occurs when attempting to connect back to the primary replica from the secondary replica in the "Set-TargetResource" function in the file "MSFT_SqlAGReplica.psm1"

# Connect to the instance that is supposed to house the primary replica $primaryReplicaServerObject = Connect-SQL -SQLServer $PrimaryReplicaServerName -SQLInstanceName $PrimaryReplicaInstanceName

From the verbose logs I can tell the primary replica is already online and restarted after AlwaysON has been enabled and the service bounced. I should be able to connect from the secondary replica to the primary replica.

I then verified that SPN's were in place to support multi-hop authentication since I am pushing this config from my desktop.

I looked in the SQL Server error log on SQLBOXA and noticed a failed login...

Login failed for 'MyDOMAIN\SQLBOXB$' Reason: Could not find a login matching the name provided. [CLIENT: SQLBOXB IP ADDRESS]

Why is the machine account used to authenticate back to the primary vs the credential that I supplied?

goforebroke commented 6 years ago

@johlju

Update the error has been resolved by adding the parameter PsDscRunAsCredential to "AddReplica"

if ( $Node.Role -eq 'SecondaryReplica' )
        {
            # Add the availability group replica to the availability group
            SqlAGReplica AddReplica
            {
                Ensure                     = 'Present'
                Name                       = $Node.NodeName
                AvailabilityGroupName      = $Node.AvailabilityGroupName
                ServerName                 = $Node.NodeName
                InstanceName               = $Node.InstanceName
                PrimaryReplicaServerName   = ($AllNodes | Where-Object { $_.Role -eq 'PrimaryReplica' }).NodeName
                PrimaryReplicaInstanceName = ($AllNodes | Where-Object { $_.Role -eq 'PrimaryReplica' }).InstanceName
                DependsOn                  = '[SqlAlwaysOnService]EnableHADR'
                ProcessOnlyOnActiveNode    = $Node.ProcessOnlyOnActiveNode
                PsDscRunAsCredential       = $Node.InstallerAccount
            }
        }

But I am now getting another error...

`PowerShell DSC resource MSFT_SqlAGReplica failed to execute Set-TargetResource functionality with error message: Unable to locate the availability group 'MYAVG' on the instance 'MSSQLSERVER'.

johlju commented 6 years ago

There are apparent issues connecting when connecting from an (new) replica. Labelling this as a bug so someone in the community can run with this. There are also a separate issue discussing to merge both the resources SqlAg and SqlAgReplica together.

CorvaxTC commented 5 years ago

Hello, I'm working on setting up AAG with DSC and had issues getting the replica up and running on named instances. after some testing, I was able to get it to work with the following modification to the replica section.

if ( $Node.Role -eq 'SecondaryReplica' ) {

Add the availability group replica to the availability group

        SqlAGReplica AddReplica
        {
            Ensure                     = 'Present'
            Name                       = **"$($Node.NodeName)\\$($Node.InstanceName)"**
            AvailabilityGroupName      = $Node.AvailabilityGroupName
            ServerName                 = $Node.NodeName
            InstanceName               = $Node.InstanceName
            PrimaryReplicaServerName   = ( $AllNodes | Where-Object { $_.Role -eq 'PrimaryReplica' } ).NodeName
            PrimaryReplicaInstanceName = ( $AllNodes | Where-Object { $_.Role -eq 'PrimaryReplica' } ).InstanceName
            DependsOn                  = '[SqlAlwaysOnService]EnableHADR'
            ProcessOnlyOnActiveNode    = $Node.ProcessOnlyOnActiveNode

PsDscRunAsCredential = $SqlAdministratorCredential } }