Allow query tags to be forced inside generated SQL to prevent SQL Server from stripping them from "query text entries"

jzabroski commented 5 years ago

https://github.com/aspnet/EntityFramework.Docs/blob/master/entity-framework/core/querying/tags.md

Based on helpful discussion with @ajcvickers

This might be somewhat controversial, but I think many people including @m60freeman and @nickcraver want very similar features: Easy way to scrap boilerplate, reduce dependency on copy-paste errors in log messages, etc.

I think the Query Tags documentation would benefit from a broader example that demonstrates how to use a self-contained data access layer (through a Repository pattern). The exact size of the sample code could start off small as people give their feedback, but the basic goal should be to give DBAs and engineers a language to discuss how to tag queries with pseudo-exact code locations.

My code sample would look something like that following:

class PersonRepository : IRepository<Person> {
  DbContext _dbContext;
  PersonRepository(DbContext dbContext) {
    _dbContext = dbContext;
  }
  private static string GetTypeAndMethodName([CallerMemberName] string callerName = null)
  {
    return $"{typeof(PersonRepository).Name}.{callerName}";
  }
  IQueryableResult<Person> All() { // assume IQueryableResult is an abstraction which blocks adding method chains to an IQueryable<T>
    return DbSet<Person>().TagWith($"{GetTypeAndMethodName()}"); // this will tag the query with "PersonRepository.All"
  }
}

jzabroski commented 5 years ago

Separately, existing code samples would benefit from nameof() operator. I believe .NET Core uses only versions of C# that contain the nameof() operator, so it should be strictly safe to use the nameof() operator in all code samples.

e.g.,

IQueryable<Friend> GetNearestFriends(Point myLocation) =>
    from f in context.Friends.TagWith("GetNearestFriends")
    orderby f.Location.Distance(myLocation) descending
    select f;

IQueryable<T> Limit<T>(IQueryable<T> source, int limit) =>
    source.TagWith("Limit").Take(limit);

becomes:

IQueryable<Friend> GetNearestFriends(Point myLocation) =>
    from f in context.Friends.TagWith(nameof(GetNearestFriends))
    orderby f.Location.Distance(myLocation) descending
    select f;

IQueryable<T> Limit<T>(IQueryable<T> source, int limit) =>
    source.TagWith(nameof(Limit)).Take(limit);

simeyla commented 4 years ago

I was excited about TagWith until I found https://github.com/MicrosoftDocs/sql-docs/issues/2789

Really really wanted to be able to search in SQL Server's Query Store to find my tagged queries, but the comments are stripped off :-(

Would love a way someday to achieve this.

jzabroski commented 4 years ago

@simeyla You should be able to get at it indirectly, using sys.sql_modules, no?

simeyla commented 4 years ago

@jzabroski I haven't looked at that table before, but it seems to just contain views and stored procs - not SQL that was dynamically created via EF Core. The problem is I can find the query in the query store but the comment gets stripped out - so it's not possible to find by Tag - that's just the way query store seems to work.

The best solution is to name a destination variable something unique that i can search for - since EF Core preserves my property names in the SQL.

eg. db.Orders.Select(o => new { OrderID_123 = o.orderID }) and then search for OrderID_123

jzabroski commented 4 years ago

@simeyla Alternative solution would be to install either SQL Sentry or SolarWinds Database Performance Analyzer. These tools spy sys.dm_exec_requests using efficient agentless polling. You can then use the Top SQL report in these tools to find sql statements containing your tag.

Agree, Query Store does not support comments. I was trying to find an easy way for you to go from sys.dm_exec_requests to query store, but if it is possible, it's likely time consuming to figure out how.

jzabroski commented 4 years ago

Here is a good blog post discussing what I was thinking: http://blog.sqlgrease.com/query_hash-query_plan_hash-useful/ - you need to go from a sql_handle to a query_hash in order to cross-walk to query store from a query plan

NickCraver commented 4 years ago

The SQL Server issue is why we put the comments after the first line, in /* Comment */ format. It won't get stripped that way and you can search it.

Apologies I didn't see that sooner, would have given that feedback to the .TagWith() initial design. Here's what we do for Stack Overflow if it helps: https://gist.github.com/NickCraver/c6f371bf8df37e05c4f09fd3c02ef6a2

IMO, a [CallerMemberName] and friends overload is hugely helpful since it tags things by default (our approach). If it's something a dev has to remember to do, it's a default pit of failure scenario. It'd be nice if this was a thing for everyone in EF Core 5.0 perhaps, and if the global option was off it'd throw away those parameters. But the paramters would be there on the next breaking set of changes (as it changes the method signature).

Anyway, I hope some of that helps!

ajcvickers commented 4 years ago

Thanks @NickCraver. We will discuss.

jzabroski commented 4 years ago

IMO, a [CallerMemberName] and friends overload is hugely helpful since it tags things by default (our approach).

That's what my original code sample suggests. However, I dislike the library (EFCore or Dapper) baking it in directly, as you lose a degree of freedom by having an extension method versus a class instance. The way I like to code things is to prevent leaking my data access layer into my business logic, which involves a bit of gymnastics such as wrapping and hiding IQueryable<T> behind either a Scalar and VectorResultSet. - exposing IQueryable<T> communicates to the caller that its safe to call things that may not be safe.

It does make a fair degree of sense to expose a generic way to wrap up [CallerXXX] into a struct - I'm not sure it needs to be private. There's not a lot of benefit to everyone writing approximately the same data marshaling code.

ajcvickers commented 4 years ago

Team notes:

Presents unique query texts executed against the database. Comments and spaces before and after the query text are ignored. Comments and spaces inside text aren't ignored. Every statement in the batch generates a separate query text entry.

So it can't be at the beginning or the end; it has to be in the middle. 🥇

ajcvickers commented 4 years ago

@roji Make sure you add this to your list. ;-)

roji commented 4 years ago

Not sure whether to put the smiley or confused emoji in these cases :/

Duly added!

ajcvickers commented 4 years ago

Team notes:

Discuss with SqlClient (/cc @cheenamalhotra @David-Engel) in case there is anything that can be done at the TDS level.
We discussed adding some no-op statement before the comment, but looking at the docs I don't think this would work since, "Every statement in the batch generates a separate query text entry".
We're not going to start parsing SQL to try to put the comment somewhere else. This would be both fragile and result in ugly query text.

I think we will conclude that this is just an issue with SQL Server. To be able to tag queries through to the server, SQL Server needs to either:

Provide a first-class mechanism for doing this
Or make it possible to store these comments

jzabroski commented 4 years ago

Sounds about right. Do note, as I mentioned, that customers can workaround this by using tools like SQL Sentry and SolarWinds DPA. It's just that there's no way to go from a query hash id to a query store id. So, while nothing is automated, you can do mental gymnastics and figure it out, and is a 1000x better than before this feature was released.

For those using DPA and reading this: You can use the Top SQLs Report and search this way: https://documentation.solarwinds.com/en/Success_Center/dpa/Content/DPA-search-for-sql-statement.htm

I have not used SQL Sentry in awhile since its more expensive.

NickCraver commented 4 years ago

I'm not sure I agree much parsing needs to be done - as shared in my gist above, we just stick it at the end of the first line. I agree it'd be nice if SQL supported comments natively but that wouldn't roll everywhere for many, many years even if they added it in the next release. It's a shame if this wouldn't make it into EF, because that's our main complaint day-to-day right now: finding where queries come from.

Put another way: if this was pluggable before the query was issued, I'd do almost exactly what we're doing in that gist and tag it, by inserting a comment in the same spot. If the framework is unwilling to do this, please: give us a way to do it ourselves. Apologies if that exists and we couldn't find it...we did look in the 2.2-era, if that's been added please give me a pointer - we'd love to start using it today.

smitpatel commented 4 years ago

Would interceptor help here? Look at the command text before executing, if initial lines are comment then take them and put after first line?

ajcvickers commented 4 years ago

@NickCraver What do you do if there is only one line? Surely putting it at the end will still result in it being stripped off.

ajcvickers commented 4 years ago

@smitpatel That would be a reasonable workaround for now. Pretty easy to code up.

NickCraver commented 4 years ago

Uhhhhhh, that's a great question. We have so few queries that are one line I'm not 100% sure we'd have noticed it not working there - let me go database spelunking!

ajcvickers commented 4 years ago

@NickCraver I've also been wondering if we ever generate query text with literally one line. @smitpatel ?

NickCraver commented 4 years ago

Alrighty evidently this does work (and is in the gist) - we try to put it at the end of the first line. If we can't do that, we stick it after the first space in the query (making it accommodate SELECT, DELETE, etc.), since SQL always starts with a keyword.

Example query: Example output:

smitpatel commented 4 years ago

At this point, the only one line SQL we generate is sprocs. Anything else will have 2 lines at least (SELECT & FROM)

ErikEJ commented 4 years ago

I think comments are stored in the query plan cache, just not in query store.

roji commented 4 years ago

@NickCraver

I'm not sure I agree much parsing needs to be done - as shared in my gist above, we just stick it at the end of the first line.

What about stuff like multiline literal strings, e.g.:

SELECT '
yo';

If you naively inject a comment at the end of the first line, doesn't it go into the literal string? Granted, this is somewhat contrived but still...

Just always injecting after the first space seems OK, although I'm not familiar with everything's that's possible in T-SQL...

NickCraver commented 4 years ago

So...it's slightly more nuanced in practice, IMO. They are stored, yes, but are they usable is a different downstream question. For example, I just ran these 4 queries:

Select Top 1 'Inline Comment' From Posts;

-- Leader Comment
Select Top 1 'Leader Comment' From Posts;

/* Slash Comment */
Select Top 1 'Slash Comment' From Posts;

Select Top 1 /* Inline Comment */
'Inline Comment' From Posts;

And then query the plan cache:

Select dm_exec_sql_text.text,
       dm_exec_query_stats.query_hash,
       dm_exec_query_stats.query_plan_hash,
       dm_exec_query_plan.query_plan
  From sys.dm_exec_query_stats 
       Cross Apply sys.dm_exec_sql_text(dm_exec_query_stats.plan_handle)
       Cross Apply sys.dm_exec_query_plan(dm_exec_query_stats.plan_handle)
 Where dm_exec_sql_text.text Like '%Comment%' 
   And dm_exec_sql_text.text Not Like '%databases%' -- Exclude this query itself

...we get this:	Text	Query Hash
Select Top 1 'NoComment' From Posts;	0x52049ECEB690DFF7	0xFA42A0367F5B21AA
/ Slash Comment / Select Top 1 'Slash Comment' From Posts;	0x52049ECEB690DFF7	0xFA42A0367F5B21AA
Select Top 1 / Inline Comment / 'Inline Comment' From Posts;	0x52049ECEB690DFF7	0xFA42A0367F5B21AA
-- Leader Comment Select Top 1 'Leader Comment' From Posts;	0x52049ECEB690DFF7	0xFA42A0367F5B21AA

Normally when accessing impact on a system, you're going to see which query is doing it, which means grouping by the query or plan hash. Comments do not factor into that portion, they are stripped before hashing. So these are indistinguishable from each other in the common roll-up scenario (we do this to monitor our SQL Servers for hot queries as an example).

I don't think it impacts this discussion directly - any comment on any of them is more likely to be helpful than not, but it is important to realize that in a roll-up scenario, we're likely seeing one flavor of the comment, even if the query came from multiple locations in code. That's just a reality I wanted people to be aware of since grouping by text is untenable at scale (though likely okay for most people...if slow).

NickCraver commented 4 years ago

@roji That's a good point, I can't say we've ever had that case. I can't think of an example where we'd want to do such a thing, but the point stands. Inserting after the first space should always be safe, yes. SQL must start with a keyword or sproc name in the "default to exec" case, AFAIK.

roji commented 4 years ago

Sounds right. At some point I'll take a look at the PostgreSQL behavior here.

jzabroski commented 4 years ago

What about stuff like multiline literal strings, e.g.:
SELECT '
yo';

Can't you just not print columns on the first line and have SELECT stand by itself on a line? Similarly for UPDATE and DELETE. This seems like a pretty simple global visitor pattern that is O(1) due to stopping at the first token in the tree.

jzabroski commented 4 years ago

At this point, the only one line SQL we generate is sprocs. Anything else will have 2 lines at least (SELECT & FROM)

Note that in the query store (and query plan cache), individual statements with sprocs are probably what matter, not the sproc itself, I conclude. By contrast, EF queries are one statement with an annotation, and thus TagWith makes more sense. Plus, as a DBA who also does C#, I am fine with just knowing which sproc needs tuning and what the cost-benefit is on my time for tuning that. It's pretty rare when a sproc is used in a cross-cutting concern, although I'm open to other DBAs giving their opinion.

BTW, @ajcvickers , if you can champion something to the SQL Server team, it would be this: I don't want the comments stripped, I want the AST nodes that matter cached (as opposed to text). This lets me do minor fixups like format code without disabling things like plan guides. It's been an annoyance for me since xml plan guides were shipped that it is so hard to lock in a plan guide due to the tiniest whitespace screwing things up. This would arguably solve both @simeyla 's problem and my long standing annoyance.

Effectively, my vision has always been that plan guides would allow me to write automated sink-hoist LLVM style optimizations by turning on or deleting plan guides based on higher-level autonomous instrumentation. That was one of the original motivations behind erecruit.Expr library.

roji commented 4 years ago

Can't you just not print columns on the first line and have SELECT stand by itself on a line? Similarly for UPDATE and DELETE. This seems like a pretty simple global visitor pattern that is O(1) due to stopping at the first token in the tree.

The point here is that we ideally want to avoid changing the structure of the SQL we generate (which in general affects other providers, not just SQL Server) because of a quirk in SQL Server. With something like Dapper the situation can be even more serious as users provide the SQL.

jzabroski commented 4 years ago

Certainly that's one point, and a good one. Recall I've previously documented how other providers accept query hints as part of an attempt to help Diego et al design a flexible "query annotation api", and so I know that Oracle has "special comments as hints": /*+ <hint_expression> */ and --+ <hint_expression>

However, I just wondered out loud if my thought had been considered and might be net faster and safer to implement than a long discussion which eats up time from other features.

The big cons I see to my "hack" is that: 1) No guarantee Oracle will never add a "Select hint" in the future 2) Even shifting the query text slightly will double the number of query plans for each logical query issued due to the fact SQL Server cares about text, not Query AST nodes. As a DBA, bloated query plan caches are one of my pet peeves. - But, may only affect SQL Server, not other products.

The big pros I see to my "hack" is that: 1) Change made in single place, possibly requiring code changes only to Core API (I'm not aware of Internals enough to know if this is true) 2) Solves immediate problem quickly, allowing developers to focus on other areas of the product

ajcvickers commented 4 years ago

We discussed this again. For most uses of query tags and for most providers the current behavior is fine. For example, the tags are in logs (which was the most asked for reason to have tags) and we verified that they still show up in the SQL Server profiler. We're also conscious that generating readable queries ("like you would write them") is a goal for EF Core, although it is less important generally than doing something that works.

We think the balance here is to leave the current SQL formatting the same, but to add an opt-in that moves the tags after the first SELECT or at the end of the first line, etc. as discussed above and shown in the gist.

robfe commented 4 years ago

I'm very keen to get tags (I'm using caller member name and caller file name) showing up in the query store (and Azure's SQL performance recommendations), really glad i found this thread...

I've found a pretty good way to get leading / trailing comments into the query store: Wrap the whole statement in parens:

(
    /*Yes!*/
    SELECT TOP 1 'Does it work with starslash in parens' from sys.tables
)

(
    --Yep!
    SELECT TOP 1'Does it work with starslash in parens' from sys.tables
)

SELECT * FROM sys.query_store_query_text where query_sql_text like '%Does it%' AND query_sql_text not like '%query_store_query_text%'

Yields

query_text_id   query_sql_text  statement_sql_handle    is_part_of_encrypted_module has_restricted_text
29  (   /*Yes!*/   SELECT TOP 1 'Does it work with starslash in parens' from sys.tables  )  0x09006C16510497B0E0663D7A79B9EFA7A7120000000000000000000000000000000000000000000000000000  0   0
30  (   --Yep!   SELECT TOP 1'Does it work with starslash in parens' from sys.tables  ) 0x0900A4F4734F6B052044F300946D4217C06F0000000000000000000000000000000000000000000000000000  0   0

Can anyone else think of a way this will negatively impact query parsing / performance / etc? Aside from the potential that changing the comment will probably break any forced plans

ajcvickers commented 4 years ago

@robfe This should be possible now using an IDbCommandInterceptor to wrap the generated SQL.

jzabroski commented 4 years ago

@ajcvickers @robfe Great teamwork!

jzabroski commented 4 years ago

Can anyone else think of a way this will negatively impact query parsing / performance / etc? Aside from the potential that changing the comment will probably break any forced plans

Some people use the nuget package from Microsoft to parse the abstract syntax tree for t-sql. In the Microsoft world, that is:

Microsoft.SqlServer.TransactSql.ScriptDom nuget package - optimized for syntax tree analysis for third party tools like Microsoft DACPAC - only supports .NET Framework 4 and may require DacFx as a hidden dependency
Microsoft.SqlServer.Management.SqlParser.dll - used by the language service, so it's optimized around those types of tasks (e.g., IntelliSense suggestions, QuickInfo tooltips, diagnostic errors).

In other words, some people try mucking with the SQL syntax tree using basic string manipulation. It would probably be better to manipulate the full AST and motion the AST back to a plaintext string when done.

This is the only scenario I can think of that could break - that your underlying parser wouldn't expect a top-level enclosing parentheses. Other possible breaks might be somehow this disabling various peephole optimizations, but I doubt it. You can use SET SHOWPLAN XML to check for any differences in the underlying plan.

RobHudson72 commented 3 years ago

I am keenly interested in this issue... Has there been any update?

If I get the gist of the conversation, this requires a change by the SQL Server team to support tags in QueryStore. Does anyone know if this is on the roadmap?

ajcvickers commented 3 years ago

@RobHudson72 This issue is tracking:

We think the balance here is to leave the current SQL formatting the same, but to add an opt-in that moves the tags after the first SELECT or at the end of the first line, etc. as discussed above and shown in the gist.

jzabroski commented 3 years ago

I am keenly interested in this issue... Has there been any update?

If I get the gist of the conversation, this requires a change by the SQL Server team to support tags in QueryStore. Does anyone know if this is on the roadmap?

There is a workaround described, where you can implement an IDbCommandInterceptor and wrap your whole SQL Statement in parentheses. Then it should work, and the expense of possibly affecting other things (e.g., if you use an external tool like SolarWinds DPA, all your query hashes it stores will change, and any work your DBA has put in to give friendly names to those hashes will be lost).

RobHudson72 commented 3 years ago

@jzabroski , thanks! Noted, and working on an internal POC... quite excited about how this might work!

forik commented 3 years ago

Wrapping an sql statement containing e.g. ORDER clause in the end of a query won't work though.

jzabroski commented 3 years ago

@forik Why can't we have nice things? I confirmed you're right. The following does work, though (verified on SQL Server v14.0.3048.4):

(
    SELECT * FROM
    (
    /*Yes!*/
    SELECT TOP 1 'Does it work with starslash in parens' AS col1 from sys.tables t ORDER BY t.[type]) AS wrapper
)
;
(
    SELECT * FROM
    (
    --Yep!
    SELECT TOP 1'Does it work with starslash in parens' AS col1 from sys.tables t ORDER BY t.[type]) AS wrapper
)

SELECT * FROM sys.query_store_query_text where query_sql_text like '%wrapper%' AND query_sql_text not like '%query_store_query_text%'

yields:

query_text_id   query_sql_text                                                                                                                                          statement_sql_handle                                                                            is_part_of_encrypted_module  has_restricted_text
251122425   (   SELECT * FROM   (   /*Yes!*/   SELECT TOP 1 'Does it work with starslash in parens' AS col1 from sys.tables t ORDER BY t.[type]) AS wrapper  )  0x0900A08003BAD6C36DDBC5D41E9650CD0D8C0000000000000000000000000000000000000000000000000000  0                            0
251122426   (   SELECT * FROM   (   --Yep!   SELECT TOP 1'Does it work with starslash in parens' AS col1 from sys.tables t ORDER BY t.[type]) AS wrapper  )     0x0900F151299CBBC41F88B218A669758437C20000000000000000000000000000000000000000000000000000  0                        0

forik commented 3 years ago

@jzabroski yeah I also tried that one with wrapping with SELECT * FROM (...) but it feels a little bit too much, doesn't it?

jzabroski commented 3 years ago

When you ask a computer scientist if an indirect way of solving a problem is a little bit too much, just know that our benchmark is the 7 layer OSI model and the POSIX file system interface :D We don't really care much about indirection as long as we can strongly reason about the results of our designs.

But yeah, it would be nice if the SQL Server team just fixed this nagging issue.

m60freeman commented 2 years ago

@jzabroski, @ajcvickers @robfe Can anyone provide the code needed to implement the suggested workaround involving IDbCommandInterceptor that puts a comment in /* class.method */ form after the first space or wraps the statement in parenthesis with such a comment just inside the open of the outer parenthesis? (I would strongly prefer the former so Query Store will capture it.)

Sorry, I'm a SQL DBA, not a C# developer. :-) I need something I can hand off to an application development team and say "Please include this in our project".

@ajcvickers Is a true built-in feature still being considered (just not for the current release) or will the feature not be implemented at all because there is a workaround?

ajcvickers commented 2 years ago

@m60freeman There's still a possibility, but it's low on the priority list. Make sure to vote (👍) for this issue if it's important to you.

m60freeman commented 2 years ago

@ajcvickers Can you please provide a link to the specific issue?

Keeping track of where Microsoft is storing such things over time is something I gave up on a while back. :-)

ajcvickers commented 2 years ago

@m60freeman Sorry, I though you were asking whether we would ever implement putting the tags inside SQL in EF Core. That is tracked by this issue. If you're asking about where SQL Server tracks things they might do, then that's as opaque to me as it is to you.

m60freeman commented 2 years ago

@ajcvickers Sorry to be a bit slow, but I don't see where I can vote for this Issue either on this page or in the list of Issues.

My question about whether this was still being considered was because you removed the "consider-for-current-release" label in October.

ajcvickers commented 2 years ago

@m60freeman You can vote for this issue by adding a 👍 on the issue description. We can then sort by number of votes. This issue currently has only one vote, while the top voted issue has 402 votes, so this issue is pretty far back in the list.

dotnet / efcore

Allow query tags to be forced inside generated SQL to prevent SQL Server from stripping them from "query text entries" #20105