dsccommunity / SqlServerDsc

This module contains DSC resources for deployment and configuration of Microsoft SQL Server.
MIT License
360 stars 225 forks source link

Connect-SQL : Failed to connect to SQL instance #2041

Open SalutAToi opened 3 months ago

SalutAToi commented 3 months ago

Problem description

When running a SQL configuration (in my case, the SQLLogin DSC resource), I'm met the below error message saying it failed to connect.

Note I was initially using Ansible with ansible.windows.win_dsc, but the below error is returned from recreating the configuration with DSC locally on the computer where I was running the ansible configuration. It still fails for the same reason. The server name is a FQDN with a record in Microsoft AD.

While that error happened using SQLLogin, I've traced it to the Connect-SQL function https://github.com/dsccommunity/SqlServerDsc/blob/23d76847a1c99324eae6fe4a0faf8a7ac24e41fb/source/Modules/SqlServerDsc.Common/SqlServerDsc.Common.psm1#L569 so I imagine the issue could be due to the upstream lib Microsoft.SqlServer.Management.Smo.Server, but I'm not sure. It does not even attempt to open a connection

I've tried specifying dbatools as SMODefaultModuleName, no success (with or without). Here' s the configuration :

$password = 'redacted' | ConvertTo-SecureString -asPlainText -Force
$username = "ad\admin"
[PSCredential] $credential = New-Object System.Management.Automation.PSCredential($username,$password)

Configuration test
{

    Import-DscResource -ModuleName 'SqlServerDsc'
    node localhost
    {
        SqlLogin 'testlogin'
        {
            Ensure               = 'Present'
            Name                 = 'ad\SQLLOGIN'
            LoginType            = 'WindowsGroup'
            ServerName           = 'server'
            InstanceName         = 'instance'
                PsDscRunAsCredential = $SqlAdministratorCredential

        }
    }
}
$cd = @{
    AllNodes = @(
        @{
            NodeName = 'localhost'
            PSDscAllowPlainTextPassword = $true
                PsDscAllowDomainUser = $true
SMODefaultModuleName = 'dbatools'
        }
    )
}
test -ConfigurationData $cd

Verbose logs

VERBOSE: Perform operation 'Invoke CimMethod' with following parameters, ''methodName' = SendConfigurationApply,'className' = MSFT_DSCLocalConfigurationManager,'namespaceName' = root/Microsoft/Windows/DesiredStateConfiguration'.
VERBOSE: An LCM method call arrived from computer BL-WEB-1 with user sid S-1-5-21-1483537166-2846010545-766114601-1115.
VERBOSE: [BL-WEB-1]: LCM:  [ Start  Set      ]
VERBOSE: [BL-WEB-1]: LCM:  [ Start  Resource ]  [[SqlLogin]testlogin]
VERBOSE: [BL-WEB-1]: LCM:  [ Start  Test     ]  [[SqlLogin]testlogin]
VERBOSE: [BL-WEB-1]:                            [[SqlLogin]testlogin] Determines if the login 'ad\user' at the instance 'server\instance' has the correct state.
VERBOSE: [BL-WEB-1]:                            [[SqlLogin]testlogin] Getting the login 'ad\user' from the instance 'server\instance'.
VERBOSE: [BL-WEB-1]:                            [[SqlLogin]testlogin] Connecting as current user 'NT AUTHORITY\SYSTEM' using integrated security. (SQLCOMMON0054)
VERBOSE: [BL-WEB-1]: LCM:  [ End    Test     ]  [[SqlLogin]testlogin]  in 1.9140 seconds.
PowerShell DSC resource DSC_SqlLogin  failed to execute Test-TargetResource functionality with error message: The running command stopped because the preference variable "ErrorActionPreference" or common parameter is set to Stop: Failed to connect to SQL instance
'server\instance'. (SQLCOMMON0019)
    + CategoryInfo          : InvalidOperation: (:) [], CimException
    + FullyQualifiedErrorId : ProviderOperationExecutionFailure
    + PSComputerName        : localhost

VERBOSE: [BL-WEB-1]: LCM:  [ End    Set      ]
The SendConfigurationApply function did not succeed.
    + CategoryInfo          : NotSpecified: (root/Microsoft/...gurationManager:String) [], CimException
    + FullyQualifiedErrorId : MI RESULT 1
    + PSComputerName        : localhost

VERBOSE: Operation 'Invoke CimMethod' complete.
VERBOSE: Time taken for configuration job to complete is 3.447 seconds

How to reproduce

Context is a windows VM running in GCP Compute Engine (google provided image) joined to a managed Microsoft AD, connecting to a Cloud SQL (MSSQL) instance on the same VPC (all network requirements verified, SSMS works fine on the same computer)

Expected behavior

Successful connection to the database instance

Current behavior

Fails to connect.

Does not even attempt a DNS request to resolve the server FQDN, even though it is properly returned by the verbose output. Does not initiate a TCP connection with the server (verified with wireshark during the request)

SSMS works fine from the same computer, same creds

Suggested solution

Nope

Operating system the target node is running

OsName               : Microsoft Windows Server 2022 Datacenter
OsOperatingSystemSKU : DatacenterServerEdition
OsArchitecture       : 64-bit
WindowsVersion       : 2009
WindowsBuildLabEx    : 20348.1.amd64fre.fe_release.210507-1500
OsLanguage           : en-US
OsMuiLanguages       : {en-US}

PowerShell version and build the target node is running

Name                           Value
----                           -----
PSVersion                      5.1.20348.2582
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.20348.2582
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

Module version used

Name                  Version
----                  -------
ActiveDirectoryDsc    6.5.0
ComputerManagementDsc 9.1.0
dbatools              2.1.22
dbatools.library      2024.4.12
NetworkingDsc         9.0.0
PackageManagement     1.4.8.1
PowerShellGet         2.2.5
SqlServer             22.3.0
SqlServerDsc          16.6.0
johlju commented 3 months ago

I wonder if there is an inner exception that happens that are not shown in the output 🤔 Suggest looking at the code that outputs the error message SQLCOMMON0019.

It might be that $_.Exception contain .InnerException that might provide a better error message. If it indeed have an inner exception with the actual error, then we need a better way to throw errors here so it also surfaces. 🤔

SalutAToi commented 3 months ago

I've found the source of the issue, it's upstream, not with SqlServerDSC :

System.ArgumentException: **The value's length for key 'Data Source' exceeds its limit of '128'**.
   at Microsoft.Data.SqlClient.SqlConnectionString..ctor(String connectionString)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnectionOptions(String connectionString, DbConnectionOptions previous)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.GetConnectionPoolGroup(DbConnectionPoolKey key, DbConnectionPoolGroupOptions poolOptions, DbConnectionOptions& userConnectionOptions)
   at Microsoft.Data.SqlClient.SqlConnection.ConnectionString_Set(DbConnectionPoolKey key)
   at Microsoft.Data.SqlClient.SqlConnection.set_ConnectionString(String value)
   at Microsoft.SqlServer.Management.Common.ConnectionManager.get_SqlConnectionObject()
   at Microsoft.SqlServer.Management.Common.ConnectionManager.InternalConnectImpl()
   at Microsoft.SqlServer.Management.Common.ConnectionManager.InternalConnect()
   at Microsoft.SqlServer.Management.Common.ConnectionManager.Connect()
   --- End of inner exception stack trace ---
   at Microsoft.SqlServer.Management.Common.ConnectionManager.Connect()
   at CallSite.Target(Closure , CallSite , Object )
   --- End of inner exception stack trace ---
   at System.Management.Automation.ExceptionHandlingOps.CheckActionPreference(FunctionContext funcContext, Exception exception)
   at System.Management.Automation.Interpreter.ActionCallInstruction2.Run(InterpretedFrame frame)
   at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
   at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)`

I'm trying to connect to a CloudSQL instance, and the SMO object will not accept FQDN longer than 128 characters for the Connect() method. Not sure why, but it's an upstream issue.

This will be an issue for anyone trying to manage a SQL instance in Cloud SQL, and I assume it would be a likely issue in Azure, AWS or other cloud providers : you do not manage the DNS record for the MSSQL server, it is automatically generated from some environment parameters (in the case of GCP, product, zone, project, instance), resulting in a long FQDN. If you wanted to take that into account to throw a specific error message, you would indeed find the detail of the above exception in $_.Exception

A workaround for that problem that should still allow for Kerberos authentication is to set a shorter CNAME record pointing to the full A/AAAA record of the MSSQL server

Going through that, I've found another obstacle related to GCP Cloud SQL (potentially other cloud providers) that could be better managed by SQLServerDSC (tell me if you'd like me to open a new Issue for that) : Cloud SQL is configured to listen to TCP/1433, but the usual procedure from the client for named instances is apparently to initiate a connection to UDP/1434 to determine a TCP port to use (if I read the docs correctly). Unfortunately, Cloud SQL does not listen to UDP/1434, resulting in timeout. As far as my research went, it cannot be configured to do so

However, the client behaviour can be changed by forcing connection on a specific TCP port. This is achieved by using the following address : tcp:server-name\instance-name,1433. This works for Cloud SQL. That being said, the way to make that work with DSC is quite hacky : you have to specify the network protocol in the ServerName argument (tcp:server-name) and the port in the InstanceName argument (instance-name,1433) when the port is in fact a second argument following net protocol, server and instance. This may be improved by allowing users to specify connection parameters via TCP and on a specific port.

johlju commented 3 months ago

Sorry to hear about the length issue. Did not know about that. Maybe you can open an issue in SqlServer repo or sqlmanagementobjects repo.

...the client behaviour can be changed by forcing connection on a specific TCP port... That being said, the way to make that work with DSC is quite hacky : you have to specify the network protocol in the ServerName argument (tcp:server-name) and the port in the InstanceName argument (instance-name,1433)

It would have been easy to add if there were all class-based resource, then we could have added two new properties (Port and Protocol) in SqlResourceBase. We could do that, but also add it to every MOF-based resource so that Connect-Sql helper function can combine the correct address. But we could also add a environment variable SQL_CLIENT_PROTOCOL or SQL_CLIENT_PORT that we can make Connect-Sql to recognize. 🤔