Open maxim-vde opened 7 months ago
hi @maxim-vde, great analysis! Isn't this a problem on sql2022's side rather than dbatools ? Did you try on a SQL2022 which isn't RTM, i.e. maybe a CU fixed the performance ?
Thanks for the quick response, @niphlod!
I have only tested on RTM builds of SQL2022. I'll upgrade our Test environment to the latest CU tonight, and will report back once I've re-ran the tests on that environment.
Upgrading to the latest CU fixed our performance issue.
Many thanks to @niphlod for bringing this to my attention!
lol, @maxim-vde , thanks a lot nonetheless, it was a reaaaally good description of the issue. Glad that a CU fixed it, thanks to Microsoft in this case
Unfortunately it seems that after a while, the issue has returned and it's back to taking around 4 minutes... Any other ideas or suggestions?
General performance of the SQL server is good, CPU utilization is hovering around 60%.
point is, IMHO, it's still a performance problem on SQL's side that dbatools can do nothing about ?!
@niphlod That's a tricky one imo, according to this MS article, the system table view appears to be deprecated and MS recommends using "the current SQL Server system views" instead.
Oddly enough, even the latest version of SqlServer.SMO.dll seems to rely on this deprecated view rather than directly utilizing the new system views.
Dbatools in its turn seems to be fetching all databases on the server, and then applying a filter, rather than only fetching the requested one. The performance impact of a single db sysprocesses lookup is negligible on its own, I could live with that.
uhm, not convinced by this, 'cause 2016 (and every other except 2022) is "speedy". So either they fix the issue at the root (i.e. the DMV, even if deprecated, is "fast"), or they fix SMO, where they find a valuable substitute for 2022 which is as fast as 2016. We set up SMO to fetch that data in one sweep rather than waiting for that data to be available at usually slow SMO rates (which is "db by db") here https://github.com/dataplat/dbatools/blob/development/public/Connect-DbaInstance.ps1#L404
So, we MAY say "if this is 2022, don't fetch activeconnections (till MS fixes it)" but it's such a niche problem that I don't know if the pros outweight the cons.
Failing that, dbatools should revert something we introduced a bit ago...
Get-DbaDatabase ... -Database 'name'
prefilters the list of databases from https://github.com/dataplat/dbatools/blob/development/public/Get-DbaDatabase.ps1#L258
there was a time where we then did $inputObject += $server.Databases[$dt.name]
(so, hopefully, enumerating sessions only in THAT db is still "fast") but then changed to $inputObject += $server.Databases | Where-Object Name -ceq $dt.name
where I'd expect the "prefilter" doesn't help your specific usecase. Maybe it was for case-sensitive instances where you can have the same db with different casing, but I reaally don't remember.
This may only fix the slowness using Get-DbaDatabase ... -Database 'name'
. If you do instead
$server = Connect-DbaInstance -SqlInstance 'SQL2022INSTANCE' -SqlCredential $sysadmincredential # Issue only occurs when executing user has 'View server state' grant.
$server.Databases | Where-Object Name -ceq 'DBNAME' # This takes 4 minutes.
there's really nothing to do here.
So, calling in the usual suspects ... @wsmelton , @potatoqualitee , @ClaudioESSilva , @andreasjordan ... any ideas here ? My vote: instead of burning the logic introduced in Get-DbaDatabase for case-sensitive instances, I'd be more inclined to fix it at the source level modifying Connect-DbaInstance where on L258 and L1082 we accomodate for the 2022 weirdness.
Just an FYI, in our troubleshooting we also tested $inputObject += $server.Databases[$dt.name]
instead of the now used $inputObject += $server.Databases | Where-Object Name -ceq $dt.name
, but that doesn't seem to improve performance.
It still wants to load all databases at once.
I agree that removing activeconnections just for 2022 instances is probably a bad idea, unless it's unused in the rest of the dbatools codebase?
Edit: I also want to add, we only tested SQL2016 & 2022, I don't know if this issue occurs on 2017 or 2019.
yep, it tracks. Point is, those "initializations" on Connect-DbaInstance are what makes dbatools usable. Also, we can't "remove a property": databases in dbatools expose the same properties SMO exposes. But, we can choose WHEN that property is queried.
Dunno if you tried with plain SMO browsing a server, but it's slow as hell because of it's parent/child structure, where till you browse the "child" object, some (a lot of) properties are not fetched.
Most of them (including ActiveConnections, except on 2022 ;-) ) are readily available in a single sweep IF you set those defaults. This means that when you walk the "parent/child", you don't wait at each child to fetch the n+1 child property, because just accessing the parent did in a previous step. Those properties are a careful hand-picked selection, where each time one got added, it had performance benefits for everyone.
People had the same issue as you, time-wise: the root cause was the exact opposite.
SMO fetched properties for ALL dbs taking 200ms, and it was faster than multiple fetches for each db, even if the single fetch took 50ms, because ... well, people have more than 4 dbs. Or, waiting 200ms instead of 50ms is not noticeable.
So, walking something (e.g. databases) "stopped and waited" too long at each step, and adding that property to the initialization step reduced significantly the duration of every operation.
You have a time-wise issue, but the reason here is that the single fetch takes 5 minutes, and we're hoping that the single fetch for a single db will take a few seconds. Depending wildly on latency, a "full scan", e.g. doing $server.Databases
on 190 dbs may take more than 5 minutes.
Think listing 30 dbs in an instance, where at EACH and EVERY db, SMO "stops and waits" for a query to retrieve specifically the "Collation" property.
Initializing SMO the way Connect-DbaInstance does results in SMO "stopping and waiting" ONCE, when you access $server
, it issues ONE query to fetch the "Collation" property for ALL databases, and then you walk the $server.Databases
without any other "wait".
This is more evident on remote servers, where with plain SMO the protocol is "chatty" and each query + latency adds up quite a bit (e.g. 30 x n queries, serially, where you can't walk the next db if the previous query is not finished), while dbatools is faster, with 30 x n LESS queries to wait for.
So, a "balanced" choice is the way: adding fields to the SMO initialization incurs in less queries: the pros are huge, the cons are:
In this case it seems that the "yet another property" doesn't add up a BIT of time: it adds a TON. And, you may NOT want to see/use ActiveConnections for all databases. So, down the line, what could happen, if my analysis is correct, is:
So, if we go down the route of de-init(ting) ActiveConnections just on 2022, assuming it takes 5 minutes at this moment to walk 100 databases on 2022 and 30 seconds on any other version, what should happen is:
$server
, if 2022 doesn't init ActiveConnections$server.Databases['SomeDB']
, dbatools fetches ActiveConnections JUST FOR THAT db (hopefully it won't take 5 minutes)$server.Databases | Where Name -eq 'SomeDB'
, dbatools fetches ActiveConnections for EACH and EVERY db, and it'll take MORE than 5 minutes, because you'll add 100*latency to the total timeGet-DbaDatabase -Name 'SomeDB'
has the issue of taking MORE than 5 minutes, as it does $server.Databases | Where Name -ceq $dt.name
Pros ?
PRO users like you can at least circumvent the issue using $server.Databases['SomeDB']
if you want to speed up "single db operations".
Cons?
ANY user targetting 2022 can do little to nothing to improve performances if they need to operate on every db you have, and it'll be statistically worse than now, time-wise.
@niphlod Thank you for this very detailed explanation! It makes perfect sense!
We ended up deploying a "custom build" of dbatools in which ActiveConnections aren't fetched in advance for all dbs but rather on a per-db basis.
This in combination with $server.Databases['SomeDB']
in Get-DbaDatabase fixes the performance issues in our particular scenario. We're back to around 8 seconds, yay :)
As this does seem to be a very niche thing, I'm not sure if I should keep this issue open?
@maxim-vde : glad that the "diagnosis" was right. I'd keep this open for another bit, some of the usual suspects may have a "contact" with the SMO team to see if this can be somehow fixed at the root. That being said as soon as 2022 adoption gets a bump I feel you won't be the only one with the problem, so the "niche thing" may very well blow up, and at least we can fix up a roadmap to tackle the issue.
Just for the sake of discussion, with the "custom build" not init(ting) ActiveConnections are your times worse than before when doing $server.Databases | Where Name -ceq 'SomeDB'
?
When I tried it a few days ago, I recall it being slower when piped to a 'Where-Object' (top of my head, 16 seconds instead of 8).
Just tried it again on 2 different DBs, these are the results for the $server.Databases['SomeDB']
approach:
Database A:
Database B:
And using the piped Where-Object approach:
Database A:
Database B:
So now it's pretty close, but still a tad slower. Not sure why this is, maybe powershell iterates the list of databases differently when using Where-Object? But that's just a wild guess.
hum, I don't recall if they are enumed by database_id or listed alphabetically.
Try with the very top and the very bottom db (by database_id and name).
Ideally, Where Name -ceq something
should take the roughly the same amount of time to filter... the "relevant" time is it'd take a bit to "output" the object, as it "must" fetch activeconnections (AFTER?) the filtering, just to land the object on the output.
The same goes for $server.Databases['something']
(granted you don't "pay" the Where- enumeration) ... it's expected to be a tad slower with hundreds of items to loop, but a 2 second gap on a 10 second period I think is acceptable.
But, and I may digress from the diagnosis a bit, at this point it seems the big performance penalty is not on the Where
vs ['name']
, which, if confirmed, means we may not need to alter Get-DbaDatabase at all, just the initialization of Connect-DbaInstance.
Doing a recap, it seems that not init(ting) ends up fetching Activeconnection "one by one" AFTER the filter rather than BEFORE (which may be confirmed by just a little time difference expected between Where
vs ['name']
). If Activeconnection was fetched BEFORE the filter, you'd see Where
"pay" *number_of_databases median_duration_for_a_single_db_query** more time.
In my previous results I had already tested alphabetically.
Database A started with 'b....' Database B started with 'z....'
Checked again with the first database_id vs the last and the results were basically the same for both lookups.
I think it's safe to assume activeconnections is fetched after the filter.
Perfect. Once again, @maxim-vde thanks a lot for your collaboration. Let us mumble on this a bit and we'll get back to you
Another possible workaround is to use SetDefaultInitFields so that SMO doesn't eagerly fetch that property. Which is ironic as the typical use case that I've seen for that is the opposite. That is, to eagerly fetch some property (say, IsSystem
) that isn't part of the default set of fields which causes an N+1 problem from SMO.
@ben-thul : that's actually what dbatools Connect-DbaInstance does. Point here is that is faster for everything EXCEPT 2022.
If that's what it does, why is it still fetching the field? Is SMO ignoring the directive to not get that attribute?
Related, I just took a look at what the default fields are for a database out of the box and ActiveConnections is still in there.
C:\Users\ben.thul
> $c = Connect-DbaInstance -SqlInstance .
C:\Users\ben.thul
> $db = Get-DbaDatabase -SqlInstance . -Database master
C:\Users\ben.thul
> $c.GetDefaultInitFields($db.GetType())
Collation
CompatibilityLevel
CreateDate
ID
IsAccessible
IsFullTextEnabled
IsSystemObject
IsUpdateable
LastBackupDate
LastDifferentialBackupDate
LastLogBackupDate
Owner
ReadOnly
RecoveryModel
ReplicationOptions
Status
Version
BrokerEnabled
DatabaseSnapshotBaseName
IsMirroringEnabled
Trustworthy
ActiveConnections
AvailabilityDatabaseSynchronizationState
AvailabilityGroupName
ContainmentType
EncryptionEnabled
I have neither a SQL 2016 nor a 2022 instance handy to test, but this should work to validate the approach:
$c = Connect-DbaInstance -SqlInstance .;
$f = $c.GetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Database]);
$f.Remove('ActiveConnections');
$c.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.Database], $f);
$dbs = Get-DbaDatabase -SqlInstance $c;
Noting both the timing and also tracing in the same way to ensure that the sysprocesses count isn't present in the query.
lemme refrase... connect-dbainstance sets activeconnection as a default field, which improves performances for everything except 2022. setting it makes SMO (when the user has enough permissions) fetch data for all dbs with one query instead of querying database per database (which is usually slower). 2022 seems to have an order of magnitude worse performance fetching data in one query, so the only viable options are:
this doesn't fix timing for enumeration of the whole set of databases, but it'll improve the "feel" when the target of operations is a small percentage of dbs of the instance.
this doesn't fix timing for enumeration of the whole set of databases
I don't understand. Why wouldn't it? Doesn't the presence of ActiveConnections in the default properties list control whether the long-running query runs or not? Honest question - if there's something I'm missing, I'm happy to learn.
Well, let's make examples:
Then, another set of examples:
Let's say prefetching activeconnections on 2022 is the culprit and accounts for 4 minutes. This doesn't mean that "post-fetching" 100 activeconnections (for each db, let's assume 3 seconds per db) is faster than a single prefetch.
So, for the "enumeration of the whole set of databases" examples, we may go from 4 minutes (240 seconds) "prefetch" to 3*100 seconds "post-fetch".
But, at least, when targeting a specific database, you won't wait 4 minutes, but just 3 seconds (which is STILL bad when compared to, say, 2016, but not "brutal")
Ah... okay. Your explanation makes sense.
@maxim-vde - have you had a chance to test whether removing ActiveConnections from the default init fields gives you better performance? Doing so will (I think) inform a potential way forward.
@ben-thul Unless I'm misinterpreting your question, yes I have. This turned out to be the solution to our performance issues as mentioned a few comments above:
@niphlod Thank you for this very detailed explanation! It makes perfect sense!
We ended up deploying a "custom build" of dbatools in which ActiveConnections aren't fetched in advance for all dbs but rather on a per-db basis. This in combination with
$server.Databases['SomeDB']
in Get-DbaDatabase fixes the performance issues in our particular scenario. We're back to around 8 seconds, yay :)As this does seem to be a very niche thing, I'm not sure if I should keep this issue open?
Have we tested on latest release of SMO?
Thanks for bringing this back up. It was a bummer that ActiveConnections are in the default init. Once the slowness has been confirmed, I can create an issue in the SMO repo to have them address it.
@potatoqualitee : news on this ? if not I think we need to bring this to a closure, one way or the other
Verified issue does not already exist?
I have searched and found no existing issue
What error did you receive?
Get-DbaDatabase consumes around 4 minutes on SQL2022, while it only takes 12 seconds on SQL2016
Steps to Reproduce
Or just:
Please confirm that you are running the most recent version of dbatools
2.1.8
Other details or mentions
I'll try to list everything we've discovered so far here:
When listing databases, Microsoft.SqlServer.Smo.dll executes following query on the server instance:
This query takes up the majority of the 4 minute total wait time. When analyzing the execution plan, we discovered most of the slowness came from following select statement within said query:
So basically, querying the active connections for each database is very slow on SQL 2022 while it wasn't on SQL2016. This also means it gets slower for each database we add to the server, we currently have around 190 databases on the SQL2022 instance.
An example, executing
select count(*) from master.dbo.sysprocesses
on both SQL2016 and SQL2022:SQL2016:
SQL2022:
We are able to reproduce this issue on 2 separate SQL2022 instances with the exact same amount of DBs (Our Production and Test environment).
I have tried replacing the SMO DLLs with a newer version '17.100.23.0' available on Nuget, but the issue persists.
As a temporary workaround, we're running our dbatools commands from a non-sysadmin SQL account, that has all privileges except for 'View server state'. Which means this user is not allowed to query all connections on the instance and thus is much, much faster when listing databases.
Again, this slowness only becomes apparent if you have many databases running on a single SQL instance, as it queries the connections for each active DB on the server.
What PowerShell host was used when producing this error
PowerShell Core (pwsh.exe)
PowerShell Host Version
PSVersion 7.4.1 PSEdition Core GitCommitId 7.4.1 OS Microsoft Windows 10.0.22621 Platform Win32NT PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…} PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1 WSManStackVersion 3.0
SQL Server Edition and Build number
Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64) Oct 8 2022 05:58:25 Copyright (C) 2022 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2022 Datacenter 10.0 (Build 20348: ) (Hypervisor)
.NET Framework Version
.NET 8.0.1