amachanic / sp_whoisactive

sp_whoisactive
GNU General Public License v3.0
1.17k stars 289 forks source link

Error related with availability group #45

Closed MaxShoshin closed 1 year ago

MaxShoshin commented 3 years ago

We've got following error when execute sp_WhoIsActive:

Executed as user: *******\sqlservice. Warning: Null value is eliminated by an aggregate or other SET operation. 
[SQLSTATE 01003] (Message 8153)  The target database, '*****', is participating in an availability group and is 
currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled 
for read access. To allow read-only access to this and other databases in the availability group, enable read access 
to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY 
GROUP statement in SQL Server Books Online. [SQLSTATE 42000] (Error 976, Level 14, State 1, Line 38)  
Warning: The join order has been enforced because a local join hint is used. [SQLSTATE 01000] (Error 8625).  
The step failed.

Version: Who Is Active? v11.35 (2020-10-04)

Additional context: This error occurs when we try to additionally use Telegraf plugin for monitoring SQL Server.
We have Always On Availability Groups. This error occurs on healthy cluster.

My idea that Telegraf exec some queries on not available db and sp_WhoIsActive tries to get additional information about this query and fails.

Could you assist us to solve this issue? Any ideas?

Thank you!

erikdarlingdata commented 3 years ago

This is a similar issue to here. I can add in a similar fix.

MaxShoshin commented 3 years ago

@erikdarlingdata, It will be great!

erikdarlingdata commented 3 years ago

Reviewing things a little bit, it looks like the two places this might happen are in the locks_cursor and the blocks_cursor, since those seem to be the only two queries that try to explicitly access a database.

@MaxShoshin can you confirm this is where you hit the error? Or provide the command that runs when you hit the error?

MaxShoshin commented 2 years ago

We still get error (on your branch):

Message 976, Level 14, State 1, Line 331
The target database, '********', is participating in an availability group and is currently
not accessible for queries. Either data movement is suspended or the availability replica 
is not enabled for read access. To allow read-only access to this and other databases in 
the availability group, enable read access to one or more secondary availability replicas 
in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in 
SQL Server Books Online.
erikdarlingdata commented 2 years ago

@MaxShoshin okay, you're gonna have to give me more information than that. I don't have an AG set up to test things on, so I was going by the most obvious potential spots where the error might be raised.

Can you do some troubleshooting locally to try to figure out where it's happening? I'm not even sure what command you're running to generate the error.

Thanks!

MaxShoshin commented 2 years ago

@erikdarlingdata, we call simple sp_WhoIsActive When Telegraf is working we got errors. It seems that telegraf call some queries from this file: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/sqlserver/sqlqueriesV2.go

erikdarlingdata commented 2 years ago

@MaxShoshin right, but I don't have an AG or use Telegraf for anything, so this is where you're gonna have to help me out. I have no idea where else in the query to try the change to help you out. I can't repro your environment to work on this issue, sorry.

MaxShoshin commented 2 years ago

I've tried to reproduce this with docker-compose... unsuccessfully...

May be we can try to run different version of sp_whoisactive with additional diagnostics? We are ready to provide additional information, but we didn't know what information do you need or how we can collect it (like original line number where error occurs).

Thank you!

erikdarlingdata commented 2 years ago

Sure, here's what you'll wanna do:

EXEC sp_WhoIsActive
    @show_own_spid = 1,
    @get_full_inner_text = 1;

Run that, and click on the sql_text column, that'll give you the whole query that WhoIsActive runs. Paste that into a new SSMS window, and get rid of the XML artifacts like <?query -- and --?>.

After that, you'll have to declare a few variables to make things run correctly:

DECLARE 
    @i bigint = 922337203685477580,
    @recursion smallint = 1,
    @blocker bit = 0;

You'll also wanna delete some of the header information that comes along:

(@recursion SMALLINT, @filter sysname, @not_filter sysname, @first_collection_ms_ticks BIGINT)DECLARE @blocker BIT;
            SET @blocker = 0;
            DECLARE @i INT;
            SET @i = 2147483647;

After that, you can hit F5 and it'll run. Lemme know if that gives you a better idea of where the error is coming from.

Thanks!

amachanic commented 2 years ago

@MaxShoshin In addition to the above, for the impacted databases, can you please tell us the value of the secondary_role_allow_connections_desc column in the sys.availability_replicas view?

MaxShoshin commented 2 years ago

@amachanic ,

secondary_role_allow_connections_desc = NO

@erikdarlingdata ,

See sql_text for sql text.

We got following error: Message 976, level 14, state 1, line 38 The target database, '**', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.

erikdarlingdata commented 2 years ago

@MaxShoshin can you let me know which version of SQL Server this is happening on? There are behavioral changes post-2016.

Also, what state is the database in? If it's reverting, initializing, still being recovered, etc. it might also need to be handled differently.

MaxShoshin commented 2 years ago

We have several versions of SQL servers: 14.0.3238.1 14.0.3381.3 14.0.3411.3

erikdarlingdata commented 1 year ago

@MaxShoshin haven't heard back from you in a while, so I'm going to close this one out.