Low priority long term plan: add new script to analyze index usage for SaaS clients

erikdarlingdata commented 1 year ago

Version of the script 8.11

What is the current behavior? When you specify @GetAllDatabases = 1 and @TableName = 'something', a few weird things happen:

Script stops running if it hits a database where the table doesn't exist
Only one database is listed in the "header" of the results (picture below)
Not all databases are processed (picture below)

If the current behavior is a bug, please provide the steps to reproduce. On my server, I have three copies of the StackOverflow database, each with the Posts table in them:

When I run this:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @GetAllDatabases = 1,
    @Debug = 1;

The script bails at AdventureWorks because Posts doesn't exist there:

When I run this:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @GetAllDatabases = 1,
    @IgnoreDatabases = N'AdventureWorks, Crap',
    @Debug = 1;

The output ends up looking like this (edited to show issues):

Under the heading for StackOverflow2013_Clean are indexes for StackOverflow2013. This also highlights the fact that the StackOverflow database doesn't produce any results the final output even though data is logged in intermediate temp tables (shown via debug):

As well as:

And they do show up when I specify the StackOverflow database:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @DatabaseName = N'StackOverflow',
    @Debug = 1;

Like so:

What is the expected behavior?

Script should gracefully skip databases where table doesn't exist
Header data reset per database processed
All databases should show results

Which versions of SQL Server and which OS are affected by this issue? Did this work in previous versions of our procedures? I've only tested this on supported versions of SQL Server: (2016+, Managed Instance). Windows is whatever.

erikdarlingdata commented 1 year ago

Just to set expectations on this, I might not get to it quickly. If anyone feels like picking it up, leave a comment so I can assign it to you. Otherwise I'll work on this as time allows.

BrentOzar commented 1 year ago

Yeah, I'm really intrigued by this. I think I'm going to do a couple of streaming sessions working on the FRK this week.

erikdarlingdata commented 1 year ago

@BrentOzar I wish nothing but the best for you both

BrentOzar commented 1 year ago

This is interesting. If you specify @TableName, then the @Mode parameter doesn't matter. My first thought on the use case was, "He wants to see Mode 2 and list all of the indexes for that object across all the databases." However, that doesn't work:

sp_BlitzIndex @GetAllDatabases = 1, @TableName = 'Posts', @Mode = 2

Produces:

ChicagoParkingTickets database failed to process. Setting the @Mode doesn't change behavior if you supply @TableName. Use default @Mode=0 to see table detail.

But in Mode 0, @GetAllDatabases doesn't really make sense because most of the result sets don't have a database name: Screenshot 2022-12-13 at 4 24 03 AM

So lemme jump back and ask - what was the use case for this? I can totally see it being used in Mode 2 or 3, and in that case, the fix is different. If you want it to work in Mode 0, that means adding database name columns to a bunch of the result sets, but also adding a sort order to each result set when @GetAllDatabases = 1. For example, if you're looking at indexes that exist, I don't think you want them sorted by database name first - probably last, because you wanna look at each database to see if the indexes are different. (Or heaven forbid, summing the numbers, which is way beyond my pay grade.)

erikdarlingdata commented 1 year ago

@BrentOzar good question! So the use case for me was a someone with a database-per-client, and one particular main table that drives a bunch of critical processes. I wanted to focus changes on that one table. Not to make all the indexes "the same" just to clean up unused and overlapping ones in each database.

The table mode output certainly isn't ideal, both because the database name is missing, and you're stuck figuring out unused/overlapping/other stuff manually. I think that would also be the case in Mode 2, though perhaps easier to manage with the output.

So yeah, whatever mode between 0/4 or 2 would be fine. Whatever is easiest. I do not need missing index requests for my purposes, here. This is pure clean up work.

BrentOzar commented 1 year ago

We've needed to do this so many times! Time to roll up my sleeves and build something serious, something that'll also work across servers because that's a common issue too. This will be a new stored proc, pared down for this particular use case, and won't be part of the "core" set since it's pretty specialized.

Thinking out loud about the use cases:

Screenshot 2022-12-14 at 5 05 29 AM

(And in case anybody else is reading, SaaS could also refer to replication, and AGs could also refer to log shipping.)

Out of scope:

Schema drift (different columns on the table, FKs, constraints, etc) - if you want that, buy Redgate Source Control. Similarly, we're just gonna check all the databases on the server - if a table doesn't exist in one of the databases, we're going to assume it's not supposed to be, and just not include that database in the output list.
Automatically detecting which servers to connect to (like in an AG)
Setting up linked servers - I'm going to assume you pass in a list of server names, and we just try to connect to them with four-part names, and check all the databases on that server.
Error handling on connecting to servers - if a server is down, you have bigger problems
Operational stats - I'm just gonna standardize on usage stats for this one unless you feel strongly otherwise
Statistics histograms & update dates
Partitioning (I'm just going to sum up the index stats and not distinguish between which databases are partitioned and which aren't)
Index IDs - I'm going to go exclusively by index name (although I'm not going to assume they're on the same columns - I think if you have the same name, but different index contents, that's a problem you wanna know about)

Initial version:

Existing indexes only (not missing indexes)
One table only (not going to try to fetch all of the objects and give overall results)
Will handle those later with additional @Mode switches

Overall version (not called for one table) output would look like:

Schema name
Table name
Number of databases where this object was found
Number of indexes: min, avg, max - to tell you if you have variances. However, we wouldn't tell you which databases have more/less indexes. For that, you would wanna call the table-level-detail one.
Number of unused indexes, total - just to give you a rough idea of whether you need to do pruning.
Number of rows: min, avg, max
Size, GB: min, avg, max

With that in mind, when you call it for a single table, then you already know you have variances, and you probably wanna know: Result set 1: overall index inventory for this table

Index name - but keep in mind that we can have several indexes with the same name, but different keys/includes.
Index version ID - in case we have the same index name with different keys/includes
Number of databases where this object was found
Key columns
Includes
Drop script
Create script Sort that first result set by index name & version.

Result set 2: details

Index name
Index version ID
Server name
Database name (and sort output by these first 4, because you're probably looking one index at a time)
Index present, yes/no
Reads on this copy
Writes on this copy
Rows in this copy

For "Index present, yes/no" - there are two use cases:

If you look at result set 1 and almost everybody has the index, then you wanna glance down at result set 2 and ask, "Who's missing the index?" In that case, there should be a row in result set 2 with "NO" here so you can quickly identify who's missing the index. (But there should also be rows for the Yes's.)
On the other hand, if few databases have the index (say, less than 25% of the copies), then you don't wanna see a whole bunch of "NO" rows.

Stopping here in the thought process, but will come back to it. If you see any holes or wanna add anything, feel free. I'll probably think through this for a few days. I thought about posting a blog post asking folks for a design review, but everybody's just gonna ask for free candy.

erikdarlingdata commented 1 year ago

Oh boy! It's a baby sp_BlitzaaS! I have a call starting shortly, but I'll give this a full read and big think.

BrentOzarULTD / SQL-Server-First-Responder-Kit

Low priority long term plan: add new script to analyze index usage for SaaS clients #3183