dataplat / dbatools

🚀 SQL Server automation and instance migrations have never been safer, faster or freer
https://dbatools.io
MIT License
2.47k stars 802 forks source link

Get-DbaDatabase significantly slower on SQL2022 vs. SQL2016 when executing user is sysadmin #9282

Open maxim-vde opened 7 months ago

maxim-vde commented 7 months ago

Verified issue does not already exist?

I have searched and found no existing issue

What error did you receive?

Get-DbaDatabase consumes around 4 minutes on SQL2022, while it only takes 12 seconds on SQL2016

Steps to Reproduce

$server = Connect-DbaInstance -SqlInstance 'SQL2022INSTANCE' -SqlCredential $sysadmincredential # Issue only occurs when executing user has 'View server state' grant.
$server.Databases | Where-Object Name -ceq 'DBNAME' # This takes 4 minutes.

Or just:

Get-DbaDatabase -SqlInstance 'SQL2022INSTANCE' -Database 'DBNAME' -SqlCredential $sysadmincredential 

Please confirm that you are running the most recent version of dbatools

2.1.8

Other details or mentions

I'll try to list everything we've discovered so far here:

When listing databases, Microsoft.SqlServer.Smo.dll executes following query on the server instance:

exec sp_executesql N'
        create table #tempbackup (database_name nvarchar(128), [type] char(1), backup_finish_date datetime)
        insert into #tempbackup select database_name, [type], max(backup_finish_date) from msdb..backupset where [type] = ''D'' or [type] = ''L'' or [type]=''I'' group by database_name, [type]

        declare @HasViewPermission int
        select @HasViewPermission = HAS_PERMS_BY_NAME(null, null, ''VIEW SERVER STATE'')

        create table #tmp_db_hadr_dbrs (group_database_id uniqueidentifier, synchronization_state tinyint, is_local bit)

        if (@HasViewPermission = 1)
        begin
        insert into #tmp_db_hadr_dbrs select group_database_id, synchronization_state, is_local from master.sys.dm_hadr_database_replica_states
        end

        create table #tmp_db_ars (replica_id uniqueidentifier, group_id uniqueidentifier)
        create table #tmp_db_ags (group_id uniqueidentifier, name sysname)

        if (@HasViewPermission = 1)
        begin
        insert into #tmp_db_ars select replica_id, group_id  from  master.sys.availability_replicas
        insert into #tmp_db_ags select group_id, name from master.sys.availability_groups            
        end

SELECT
dtb.name AS [Name],
dtb.collation_name AS [Collation],
dtb.compatibility_level AS [CompatibilityLevel],
dtb.create_date AS [CreateDate],
dtb.database_id AS [ID],
CAST(has_dbaccess(dtb.name) AS bit) AS [IsAccessible],
dtb.is_fulltext_enabled AS [IsFullTextEnabled],
CAST(case when dtb.name in (''master'',''model'',''msdb'',''tempdb'') then 1 else dtb.is_distributor end AS bit) AS [IsSystemObject],
CAST(( case LOWER(convert( nvarchar(128), DATABASEPROPERTYEX(dtb.name, ''Updateability''))) when ''read_write'' then 1 else 0 end) AS bit) AS [IsUpdateable],
(select backup_finish_date from #tempbackup where type = @_msparam_0 and db_id(database_name) = dtb.database_id) AS [LastBackupDate],
(select backup_finish_date from #tempbackup where type = @_msparam_1 and db_id(database_name) = dtb.database_id) AS [LastDifferentialBackupDate],
(select backup_finish_date from #tempbackup where type = @_msparam_2 and db_id(database_name) = dtb.database_id) AS [LastLogBackupDate],
ISNULL(suser_sname(dtb.owner_sid),'''') AS [Owner],
dtb.is_read_only AS [ReadOnly],
dtb.recovery_model AS [RecoveryModel],
(dtb.is_published*1+dtb.is_subscribed*2+dtb.is_merge_published*4) AS [ReplicationOptions],

        case
        when dtb.collation_name is null then 0x200
        else 0
        end |
        case
        when 1 = dtb.is_in_standby then 0x40
        else 0
        end |
        case dtb.state
        when 1 then 0x2
        when 2 then 0x8
        when 3 then 0x4
        when 4 then 0x10
        when 5 then 0x100
        when 6 then 0x20
        else 1
        end
       AS [Status],
CAST(DATABASEPROPERTYEX(dtb.name, ''Version'') AS int) AS [Version],
dtb.is_broker_enabled AS [BrokerEnabled],
ISNULL(DB_NAME(dtb.source_database_id), N'''') AS [DatabaseSnapshotBaseName],
CAST(case when dmi.mirroring_partner_name is null then 0 else 1 end AS bit) AS [IsMirroringEnabled],
dtb.is_trustworthy_on AS [Trustworthy],
(select count(*) from master.dbo.sysprocesses p where dtb.database_id=p.dbid) AS [ActiveConnections],
dbrs.synchronization_state AS [AvailabilityDatabaseSynchronizationState],
ISNULL(ags.name, '''') AS [AvailabilityGroupName],
dtb.containment AS [ContainmentType],
CAST(dtb.is_encrypted AS bit) AS [EncryptionEnabled],
dtb.name AS [DatabaseName2]
FROM
master.sys.databases AS dtb
LEFT OUTER JOIN sys.database_mirroring AS dmi ON dmi.database_id = dtb.database_id
LEFT OUTER JOIN #tmp_db_hadr_dbrs AS dbrs ON dtb.group_database_id = dbrs.group_database_id and dbrs.is_local = 1
LEFT OUTER JOIN #tmp_db_ars AS ars ON dtb.replica_id = ars.replica_id
LEFT OUTER JOIN #tmp_db_ags AS ags ON ars.group_id = ags.group_id
ORDER BY
[Name] ASC

        drop table #tempbackup

        drop table #tmp_db_hadr_dbrs

        drop table #tmp_db_ars
        drop table #tmp_db_ags        

',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000),@_msparam_2 nvarchar(4000)',@_msparam_0=N'D',@_msparam_1=N'I',@_msparam_2=N'L'

This query takes up the majority of the 4 minute total wait time. When analyzing the execution plan, we discovered most of the slowness came from following select statement within said query:

(select count(*) from master.dbo.sysprocesses p where dtb.database_id=p.dbid) AS [ActiveConnections],

So basically, querying the active connections for each database is very slow on SQL 2022 while it wasn't on SQL2016. This also means it gets slower for each database we add to the server, we currently have around 190 databases on the SQL2022 instance.

An example, executing select count(*) from master.dbo.sysprocesses on both SQL2016 and SQL2022:

SQL2016: image

SQL2022: image

We are able to reproduce this issue on 2 separate SQL2022 instances with the exact same amount of DBs (Our Production and Test environment).

I have tried replacing the SMO DLLs with a newer version '17.100.23.0' available on Nuget, but the issue persists.

As a temporary workaround, we're running our dbatools commands from a non-sysadmin SQL account, that has all privileges except for 'View server state'. Which means this user is not allowed to query all connections on the instance and thus is much, much faster when listing databases.

Again, this slowness only becomes apparent if you have many databases running on a single SQL instance, as it queries the connections for each active DB on the server.

What PowerShell host was used when producing this error

PowerShell Core (pwsh.exe)

PowerShell Host Version

PSVersion 7.4.1 PSEdition Core GitCommitId 7.4.1 OS Microsoft Windows 10.0.22621 Platform Win32NT PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…} PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1 WSManStackVersion 3.0

SQL Server Edition and Build number

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64) Oct 8 2022 05:58:25 Copyright (C) 2022 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2022 Datacenter 10.0 (Build 20348: ) (Hypervisor)

.NET Framework Version

.NET 8.0.1

niphlod commented 7 months ago

hi @maxim-vde, great analysis! Isn't this a problem on sql2022's side rather than dbatools ? Did you try on a SQL2022 which isn't RTM, i.e. maybe a CU fixed the performance ?

maxim-vde commented 7 months ago

Thanks for the quick response, @niphlod!

I have only tested on RTM builds of SQL2022. I'll upgrade our Test environment to the latest CU tonight, and will report back once I've re-ran the tests on that environment.

maxim-vde commented 7 months ago

Upgrading to the latest CU fixed our performance issue.

Many thanks to @niphlod for bringing this to my attention!

niphlod commented 7 months ago

lol, @maxim-vde , thanks a lot nonetheless, it was a reaaaally good description of the issue. Glad that a CU fixed it, thanks to Microsoft in this case

maxim-vde commented 7 months ago

Unfortunately it seems that after a while, the issue has returned and it's back to taking around 4 minutes... Any other ideas or suggestions?

General performance of the SQL server is good, CPU utilization is hovering around 60%.

niphlod commented 7 months ago

point is, IMHO, it's still a performance problem on SQL's side that dbatools can do nothing about ?!

maxim-vde commented 7 months ago

@niphlod That's a tricky one imo, according to this MS article, the system table view appears to be deprecated and MS recommends using "the current SQL Server system views" instead.

Oddly enough, even the latest version of SqlServer.SMO.dll seems to rely on this deprecated view rather than directly utilizing the new system views.

Dbatools in its turn seems to be fetching all databases on the server, and then applying a filter, rather than only fetching the requested one. The performance impact of a single db sysprocesses lookup is negligible on its own, I could live with that.

niphlod commented 7 months ago

uhm, not convinced by this, 'cause 2016 (and every other except 2022) is "speedy". So either they fix the issue at the root (i.e. the DMV, even if deprecated, is "fast"), or they fix SMO, where they find a valuable substitute for 2022 which is as fast as 2016. We set up SMO to fetch that data in one sweep rather than waiting for that data to be available at usually slow SMO rates (which is "db by db") here https://github.com/dataplat/dbatools/blob/development/public/Connect-DbaInstance.ps1#L404

So, we MAY say "if this is 2022, don't fetch activeconnections (till MS fixes it)" but it's such a niche problem that I don't know if the pros outweight the cons.

Failing that, dbatools should revert something we introduced a bit ago... Get-DbaDatabase ... -Database 'name' prefilters the list of databases from https://github.com/dataplat/dbatools/blob/development/public/Get-DbaDatabase.ps1#L258

there was a time where we then did $inputObject += $server.Databases[$dt.name] (so, hopefully, enumerating sessions only in THAT db is still "fast") but then changed to $inputObject += $server.Databases | Where-Object Name -ceq $dt.name where I'd expect the "prefilter" doesn't help your specific usecase. Maybe it was for case-sensitive instances where you can have the same db with different casing, but I reaally don't remember.

This may only fix the slowness using Get-DbaDatabase ... -Database 'name'. If you do instead

$server = Connect-DbaInstance -SqlInstance 'SQL2022INSTANCE' -SqlCredential $sysadmincredential # Issue only occurs when executing user has 'View server state' grant.
$server.Databases | Where-Object Name -ceq 'DBNAME' # This takes 4 minutes.

there's really nothing to do here.

So, calling in the usual suspects ... @wsmelton , @potatoqualitee , @ClaudioESSilva , @andreasjordan ... any ideas here ? My vote: instead of burning the logic introduced in Get-DbaDatabase for case-sensitive instances, I'd be more inclined to fix it at the source level modifying Connect-DbaInstance where on L258 and L1082 we accomodate for the 2022 weirdness.

maxim-vde commented 7 months ago

Just an FYI, in our troubleshooting we also tested $inputObject += $server.Databases[$dt.name] instead of the now used $inputObject += $server.Databases | Where-Object Name -ceq $dt.name, but that doesn't seem to improve performance. It still wants to load all databases at once.

I agree that removing activeconnections just for 2022 instances is probably a bad idea, unless it's unused in the rest of the dbatools codebase?

Edit: I also want to add, we only tested SQL2016 & 2022, I don't know if this issue occurs on 2017 or 2019.

niphlod commented 7 months ago

yep, it tracks. Point is, those "initializations" on Connect-DbaInstance are what makes dbatools usable. Also, we can't "remove a property": databases in dbatools expose the same properties SMO exposes. But, we can choose WHEN that property is queried.

Dunno if you tried with plain SMO browsing a server, but it's slow as hell because of it's parent/child structure, where till you browse the "child" object, some (a lot of) properties are not fetched. Most of them (including ActiveConnections, except on 2022 ;-) ) are readily available in a single sweep IF you set those defaults. This means that when you walk the "parent/child", you don't wait at each child to fetch the n+1 child property, because just accessing the parent did in a previous step. Those properties are a careful hand-picked selection, where each time one got added, it had performance benefits for everyone. People had the same issue as you, time-wise: the root cause was the exact opposite. SMO fetched properties for ALL dbs taking 200ms, and it was faster than multiple fetches for each db, even if the single fetch took 50ms, because ... well, people have more than 4 dbs. Or, waiting 200ms instead of 50ms is not noticeable. So, walking something (e.g. databases) "stopped and waited" too long at each step, and adding that property to the initialization step reduced significantly the duration of every operation. You have a time-wise issue, but the reason here is that the single fetch takes 5 minutes, and we're hoping that the single fetch for a single db will take a few seconds. Depending wildly on latency, a "full scan", e.g. doing $server.Databases on 190 dbs may take more than 5 minutes.

Think listing 30 dbs in an instance, where at EACH and EVERY db, SMO "stops and waits" for a query to retrieve specifically the "Collation" property. Initializing SMO the way Connect-DbaInstance does results in SMO "stopping and waiting" ONCE, when you access $server, it issues ONE query to fetch the "Collation" property for ALL databases, and then you walk the $server.Databases without any other "wait". This is more evident on remote servers, where with plain SMO the protocol is "chatty" and each query + latency adds up quite a bit (e.g. 30 x n queries, serially, where you can't walk the next db if the previous query is not finished), while dbatools is faster, with 30 x n LESS queries to wait for.

So, a "balanced" choice is the way: adding fields to the SMO initialization incurs in less queries: the pros are huge, the cons are:

In this case it seems that the "yet another property" doesn't add up a BIT of time: it adds a TON. And, you may NOT want to see/use ActiveConnections for all databases. So, down the line, what could happen, if my analysis is correct, is:

So, if we go down the route of de-init(ting) ActiveConnections just on 2022, assuming it takes 5 minutes at this moment to walk 100 databases on 2022 and 30 seconds on any other version, what should happen is:

Pros ? PRO users like you can at least circumvent the issue using $server.Databases['SomeDB'] if you want to speed up "single db operations". Cons? ANY user targetting 2022 can do little to nothing to improve performances if they need to operate on every db you have, and it'll be statistically worse than now, time-wise.

maxim-vde commented 7 months ago

@niphlod Thank you for this very detailed explanation! It makes perfect sense!

We ended up deploying a "custom build" of dbatools in which ActiveConnections aren't fetched in advance for all dbs but rather on a per-db basis. This in combination with $server.Databases['SomeDB'] in Get-DbaDatabase fixes the performance issues in our particular scenario. We're back to around 8 seconds, yay :)

As this does seem to be a very niche thing, I'm not sure if I should keep this issue open?

niphlod commented 7 months ago

@maxim-vde : glad that the "diagnosis" was right. I'd keep this open for another bit, some of the usual suspects may have a "contact" with the SMO team to see if this can be somehow fixed at the root. That being said as soon as 2022 adoption gets a bump I feel you won't be the only one with the problem, so the "niche thing" may very well blow up, and at least we can fix up a roadmap to tackle the issue.

Just for the sake of discussion, with the "custom build" not init(ting) ActiveConnections are your times worse than before when doing $server.Databases | Where Name -ceq 'SomeDB' ?

maxim-vde commented 7 months ago

When I tried it a few days ago, I recall it being slower when piped to a 'Where-Object' (top of my head, 16 seconds instead of 8).

Just tried it again on 2 different DBs, these are the results for the $server.Databases['SomeDB'] approach:

Database A: image

Database B: image

And using the piped Where-Object approach:

Database A: image

Database B: image

So now it's pretty close, but still a tad slower. Not sure why this is, maybe powershell iterates the list of databases differently when using Where-Object? But that's just a wild guess.

niphlod commented 7 months ago

hum, I don't recall if they are enumed by database_id or listed alphabetically. Try with the very top and the very bottom db (by database_id and name). Ideally, Where Name -ceq something should take the roughly the same amount of time to filter... the "relevant" time is it'd take a bit to "output" the object, as it "must" fetch activeconnections (AFTER?) the filtering, just to land the object on the output. The same goes for $server.Databases['something'] (granted you don't "pay" the Where- enumeration) ... it's expected to be a tad slower with hundreds of items to loop, but a 2 second gap on a 10 second period I think is acceptable.

But, and I may digress from the diagnosis a bit, at this point it seems the big performance penalty is not on the Where vs ['name'], which, if confirmed, means we may not need to alter Get-DbaDatabase at all, just the initialization of Connect-DbaInstance. Doing a recap, it seems that not init(ting) ends up fetching Activeconnection "one by one" AFTER the filter rather than BEFORE (which may be confirmed by just a little time difference expected between Where vs ['name']). If Activeconnection was fetched BEFORE the filter, you'd see Where "pay" *number_of_databases median_duration_for_a_single_db_query** more time.

maxim-vde commented 7 months ago

In my previous results I had already tested alphabetically.

Database A started with 'b....' Database B started with 'z....'

Checked again with the first database_id vs the last and the results were basically the same for both lookups.

I think it's safe to assume activeconnections is fetched after the filter.

niphlod commented 7 months ago

Perfect. Once again, @maxim-vde thanks a lot for your collaboration. Let us mumble on this a bit and we'll get back to you

ben-thul commented 7 months ago

Another possible workaround is to use SetDefaultInitFields so that SMO doesn't eagerly fetch that property. Which is ironic as the typical use case that I've seen for that is the opposite. That is, to eagerly fetch some property (say, IsSystem) that isn't part of the default set of fields which causes an N+1 problem from SMO.

niphlod commented 7 months ago

@ben-thul : that's actually what dbatools Connect-DbaInstance does. Point here is that is faster for everything EXCEPT 2022.

ben-thul commented 7 months ago

If that's what it does, why is it still fetching the field? Is SMO ignoring the directive to not get that attribute?

ben-thul commented 7 months ago

Related, I just took a look at what the default fields are for a database out of the box and ActiveConnections is still in there.

C:\Users\ben.thul
> $c = Connect-DbaInstance -SqlInstance .
C:\Users\ben.thul
> $db = Get-DbaDatabase -SqlInstance . -Database master
C:\Users\ben.thul
> $c.GetDefaultInitFields($db.GetType())
Collation
CompatibilityLevel
CreateDate
ID
IsAccessible
IsFullTextEnabled
IsSystemObject
IsUpdateable
LastBackupDate
LastDifferentialBackupDate
LastLogBackupDate
Owner
ReadOnly
RecoveryModel
ReplicationOptions
Status
Version
BrokerEnabled
DatabaseSnapshotBaseName
IsMirroringEnabled
Trustworthy
ActiveConnections
AvailabilityDatabaseSynchronizationState
AvailabilityGroupName
ContainmentType
EncryptionEnabled
ben-thul commented 7 months ago

I have neither a SQL 2016 nor a 2022 instance handy to test, but this should work to validate the approach:

$c = Connect-DbaInstance -SqlInstance .;
$f = $c.GetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Database]);
$f.Remove('ActiveConnections');
$c.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Database], $f);
$dbs = Get-DbaDatabase -SqlInstance $c;

Noting both the timing and also tracing in the same way to ensure that the sysprocesses count isn't present in the query.

niphlod commented 7 months ago

lemme refrase... connect-dbainstance sets activeconnection as a default field, which improves performances for everything except 2022. setting it makes SMO (when the user has enough permissions) fetch data for all dbs with one query instead of querying database per database (which is usually slower). 2022 seems to have an order of magnitude worse performance fetching data in one query, so the only viable options are:

this doesn't fix timing for enumeration of the whole set of databases, but it'll improve the "feel" when the target of operations is a small percentage of dbs of the instance.

ben-thul commented 7 months ago

this doesn't fix timing for enumeration of the whole set of databases

I don't understand. Why wouldn't it? Doesn't the presence of ActiveConnections in the default properties list control whether the long-running query runs or not? Honest question - if there's something I'm missing, I'm happy to learn.

niphlod commented 7 months ago

Well, let's make examples:

Then, another set of examples:

Let's say prefetching activeconnections on 2022 is the culprit and accounts for 4 minutes. This doesn't mean that "post-fetching" 100 activeconnections (for each db, let's assume 3 seconds per db) is faster than a single prefetch.

So, for the "enumeration of the whole set of databases" examples, we may go from 4 minutes (240 seconds) "prefetch" to 3*100 seconds "post-fetch".

But, at least, when targeting a specific database, you won't wait 4 minutes, but just 3 seconds (which is STILL bad when compared to, say, 2016, but not "brutal")

ben-thul commented 7 months ago

Ah... okay. Your explanation makes sense.

@maxim-vde - have you had a chance to test whether removing ActiveConnections from the default init fields gives you better performance? Doing so will (I think) inform a potential way forward.

maxim-vde commented 7 months ago

@ben-thul Unless I'm misinterpreting your question, yes I have. This turned out to be the solution to our performance issues as mentioned a few comments above:

@niphlod Thank you for this very detailed explanation! It makes perfect sense!

We ended up deploying a "custom build" of dbatools in which ActiveConnections aren't fetched in advance for all dbs but rather on a per-db basis. This in combination with $server.Databases['SomeDB'] in Get-DbaDatabase fixes the performance issues in our particular scenario. We're back to around 8 seconds, yay :)

As this does seem to be a very niche thing, I'm not sure if I should keep this issue open?

wsmelton commented 6 months ago

Have we tested on latest release of SMO?

potatoqualitee commented 6 months ago

Thanks for bringing this back up. It was a bummer that ActiveConnections are in the default init. Once the slowness has been confirmed, I can create an issue in the SMO repo to have them address it.

niphlod commented 1 month ago

@potatoqualitee : news on this ? if not I think we need to bring this to a closure, one way or the other