dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.66k stars 3.15k forks source link

Query hints (raw SQL, such as OPTIONs) #6717

Open rowanmiller opened 7 years ago

rowanmiller commented 7 years ago

An example from https://github.com/aspnet/EntityFramework/issues/6649 is being able to append OPTION (MAXRECURSION 2000) to the end of a query.

roji commented 7 years ago

While the idea of integrating arbitrary provider-specific hints in queries seems useful, treating this as some raw SQL to be tacked at the end of the query may be too limited. A specific provider's SQL dialect may require the additional SQL to be integrated somewhere in the middle, etc...

Some arbitrary annotations on the query itself which would be picked up by the provider-specific query SQL generator may cover more cases.

rowanmiller commented 7 years ago

@roji yeah I wasn't intending to limit it to appending text to the end of the query - I removed the word "append" from the title 😄 .

roji commented 7 years ago

Ok great :)

ajcvickers commented 6 years ago

Note from triage: we need to investigate what commonality exists here across relational database systems. For example, where can hints be attached? Is there non-SQL Server value in a query-hint only (i.e. always at the end of the query) solution?

/cc @roji @caleblloyd @ErikEJ

roji commented 6 years ago

Here's the PostgreSQL documentation page on querying.

I'm not sure how many things it has which can be tacked on without impacting the shape of the results (in which case they'd be beyond the scope of this issue). Some ideas that come to mind:

Seems like ideally providers would be able to define an extension method on the IQueryable (like AsNoTracking()), which would be picked up by the SQL generator.

smitpatel commented 6 years ago

Design Notes:

Changes to relational level remain pre-requisite for provider level work but the issue remains in backlog for now. Community contributions appreciated.

smitpatel commented 6 years ago

cc: @ralmsdeveloper

roji commented 6 years ago

Another use case: specifying per-query collation. PostgreSQL allows the following on an expression:

SELECT a < b COLLATE "de_DE" FROM test1;

or on a clumn:

SELECT a COLLATE "de_DE" < b FROM test1;
roji commented 6 years ago

@smitpatel agree that the way forward here is by allowing annotations to be specified by the user in the query, and for those annotations to flow to the provider's query SQL generator. The question is indeed granularity and which elements of the query expression tree can be thus annotated.

ralmsdeveloper commented 6 years ago

Thoughts: What @roji spoke makes sense to me, I believe we could create a method for annotation on EFCore.Ralational to meet the following needs, following SQL Server as the basis, but the providers would be responsible for consolidating the end information .

We would have to have one more property in TableExpression (public virtual string Hint {get;set;}).

Remember that when using the annotation, this would be replicated in the "Include" tables: x.Include(y => y.Table)

Or have one more parameter in the extension method: "NoInclude"

public static IQueryable<TEntity> With<TEntity>(
    this IQueryable<TEntity> source,
    string hint,
    string optionalParameter = null);

Sample

var query = _db
    .Blogs 
    .With("NOLOCK", "INDEX(myIndex)") // Second parameter optional
    .ToList();

Output SQL

SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK, INDEX(myIndex))  

--or 

SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK)  

We could also use Enum to do this:

This would prevent user typing error.

PAGLOCK, NOLOCK, READCOMMITTEDLOCK, ROWLOCK, TABLOCK ou TABLOCKX.

Sample

var query = _db
    .Blogs 
    .With(Hints.NOLOCK)
    .ToList();

For the @roji example: https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-397403491

I believe we can do this in EF.Functions.

ralmsdeveloper commented 6 years ago

Or just:

public static IQueryable<TEntity> Hint<TEntity>(
    this IQueryable<TEntity> source,
    string hint,
    bool repeatForInclude = true);
var query = _db
    .Blogs 
    .Hint("WITH (NOLOCK)")
    .ToList();

This is left under the responsibility of the user.

In my thinking I believe that this also becomes more flexible, the user can theoretically use all available hints, since initially this would be designed only for queries!

smitpatel commented 6 years ago

@ralmsdeveloper - That is what we decided explicitly not to do it. Unless there is any common query hint across all providers.

Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.

Relational layer will just provide the storage facility in terms of query annotations. So provider methods can add query annotations, (whatever shape), and those annotations will be flowed to SQL generator by relational, where provider can look at annotations and determine what to write in SQL. Further it also avoids breaking TableExpression since there is no need to add anything like Hint in any expression. "Hints" will be available globally to SelectExpression while printing, providers can determine where each hint will be printed.

roji commented 6 years ago

In addition to what @smitpatel wrote above, a simple API to tack strings inside the query is also a pretty bad user experience - the user would need to have knowledge about specific strings and where they go inside the query. It's much better to have provider-defined extension methods which add structured annotations, which are then picked up by the query SQL generator and rendered in SQL. This is also in line with the current way of specifying provider-specific settings on the model, which get translated to SQL by the migrations generator.

@smitpatel if I understand correctly the initial plan is to only allow annotations on the SelectExpression? While it's true this should cover most basic cases, it's a good idea to at least consider what the more flexible/advanced version of this would look like. As I've written above there are viable cases where a column expression, or indeed even a comparison expression could be annotated (see the collation example). I'm not saying these should be supported by day one (in fact I've received no actual request for anything of the sort) but it's good to keep it in mind.

ralmsdeveloper commented 6 years ago

I get it

Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.

Like I said before, it's just thoughts. My view was just in annotation to table, so I talked about creating another property, why providers could override "VisitTable"

However, if we can retrieve anottations from SelectExpressionwould be enough, so the writers of the providers can simply do what they think best, to meet the more specific needs of each provider.

smitpatel commented 6 years ago

@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

roji commented 6 years ago

@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.

I understand, but unless I'm mistaken that would make it impossible (or very awkward) to apply different hints to different parts of the query, right? For example, if I want want ORDER BY to sort nulls first, and another to sort nulls last, it would be very difficult to express that, won't it?

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

Just to be extra clear - this seems to be a pretty rare requirement and I'm fairly certain 95% of needs will be met by specifying an annotation "globally" at the query level.

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

EF.Functions.StoreType() does seem like a simple case that can probably be handled with a simple method call translator (although maybe consider a shorter name such as EF.StoreType()?). In the more general case I'm not sure exactly what you're asking with respects to granularity... collation can be specified on each single column, or on a comparison expression involving two columns.

smitpatel commented 6 years ago

If you have ideas for API for things which cannot be specified at "global" level, then please file a new issue with API ideas, we can explore how to flow information to SQL.

ralmsdeveloper commented 6 years ago

Because my English is bad, I prefer to write short texts.

@smitpatel then look at this here: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql?view=sql-server-2017

Not everyone has this need, but to make a provider much more powerful and flexible, some of these HINTS should be supported at least: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-2017#syntax

It's worth mentioning that this is a totally isolated SQL Server thing.

Actually now would be to know what kind of API would be global, since the cases are totally different for SQL Server and PostgreSQL.

Then we would have to have two methods:

I believe that the utility method for the PostgreSQL Provider would be only WithLock, which would translate something like:

SELECT * FROM BAR FOR UPDATE

When I thought about using ENUM I was thinking of supporting all this in one way:

  | FORCESCAN
  | FORCESEEK
  | HOLDLOCK
  | NOLOCK
  | NOWAIT
  | PAGLOCK
  | READCOMMITTED
  | READCOMMITTEDLOCK
  | READPAST
  | READUNCOMMITTED
  | REPEATABLEREAD
  | ROCKLOCK
  | SERIALIZABLE
  | SNAPSHOT
  | TABLOCK
  | TABLOCKX
  | UPDLOCK
  | XLOCK
smitpatel commented 6 years ago

@ralmsdeveloper - Yes, there can be methods like WithNoLock & WithLock but it would be in SqlServer provider. Same way Postgre can add more. Unless we identify hints which are "uniform" across multiple providers like SqlServer, PostgreSQL, SQLite, MySQL, Oracle etc, adding a relational surface may cause more burden on provider writers. (Also there would not be a good default for SQL which we need and use in DefaultQuerySqlGenerator.

ralmsdeveloper commented 6 years ago

@smitpatel & @roji, sorry for my poor communication. But I'm working hard to communicate better with you.

@smitpatel Anyway, what you said was very clear to me now, that would be the way. As a suggestion, there could be a parameter in these methods, as I mentioned above. repeatForInclude for the same annotation if replicate the remaining query tables (INNER JOIN / SUBQUERY)

bricelam commented 6 years ago

Looks like SQLite has just one hint: INDEXED BY

And NOT INDEXED ...for all those times you wish the query planner would just stop using all those pesky indexes. 😖⁉

bbakermmc commented 6 years ago

Would this also be supported under this, or is it another issue, or is there a current way to implement it?

WITH CHANGE_TRACKING_CONTEXT (context)

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/with-change-tracking-context-transact-sql?view=sql-server-2017

roji commented 6 years ago

Just to reiterate the importance of something like EF.StoreType(value, storeType), we've recently seen lots of issues where there's a need to explicitly specify store types when the same CLR type can be mapped to more than one store type (see bottom of issue for some examples).

I just realized that this is a bit more complicated than a simple method translator, since we need to do it for literals/constants as well. In other words, when I specify a literal in my query, EF Core looks up the mapping for that CLR type and uses that to render a literal SQL representation - but we need to be able to specify another type. This choice should affect the mapping that is selected, rather than add a server-side casting operator.

I'm not sure that this problem (both the parameter version and the literal version) is best tracked by this issue, which is more about flowing annotations to the provider's SQL generator. It's also more urgent than general-purpose query SQL hints. Maybe we should split it off and handle separately?

List of scenarios where this is needed:

ajcvickers commented 6 years ago

@roji Thanks for the input. We're tracking this feature as #4978. I have removed that issue from the backlog so we can discuss priority in triage.

roji commented 6 years ago

Oh sorry, I forgot that #4978 already existed. Thanks for giving this your attention.

erikj999 commented 6 years ago

Hi, I'm not sure from the discussion where the current thinking on QUERY/TABLE hints has landed and whether a solution is in mind. IMO, the biggest need for many developers is simply control of locking during queries (i.e. table hints). Just that one feature would be really welcome.

For EF 6, we created a With() extension -- just like Rafael described (except with an enum to avoid injection issues). In our implementation, the With() call has to come after an entity collection so we know how to translate the expression and correctly associate the injected WITH statement to the right FROM or JOIN. We also support accepting an index hint, but the feature never gets used (and it's a potential injection issue because the caller passes the index name as a string).

Anyway, the lack of any way to set a table hint for locking in EF Core is our main blocker to migrating our entire commercial ERP app to .Net Core. So I'm super interested in how this issue progresses. Thanks,

-Erik

divega commented 6 years ago

@erikj999 great to hear from you! I have a couple of questions:

  1. What if we gave you the ability to control locking at the query granularity level instead of at the table level? That is, you specify the locking hint once on the query and we take care of propagating it to all the intervening tables. Would that be something your application could use? In other words, how important is it to be able to specify something like NOLOCK only for some of the tables or completely different locking hints on different tables in the same query?
  2. What are the locking hints you use?
erikj999 commented 6 years ago

Hi @divega, great to hear from you, too. Here's what worries me about propagating a lock hint throughout a query. With READ COMMITTED SNAPSHOT mode, which we use and I think is the default for SQL Azure and given the query:

BEGIN TRANSACTION -- Otherwise nothing gets locked SELECT c.Company, c.CustNum, c.Name FROM Customer AS [c] WITH(UPDLOCK) JOIN Company AS [c1] ON (c.Company = c1.Company) WHERE c1.Country = N'USA'

SQL will hold update intent locks on the Customer table, just as the hint prescribes, but won't hold any locks on the joined Company table. So if a Linq statement includes a lock hint and EF Core propagates that lock hint across all tables in the query, that behavior will be inconsistent with SQL and would cause undesired locking.

IMO, if we only get one table hint to specify in an EF query, then it should only be applied to the FROM table only. It would be better if we could have a table hint attached to any entity in the query, where it would be applied correctly to the FROM or JOIN when the T-SQL is expressed, to make the lock behavior more explicit. Hope that makes sense.

The lock hints we use are:

NOLOCK (does nothing but make us feel good, considering we use Read-Committed Snapshot) HOLDLOCK READPAST UPDLOCK (We use this one a ton) FORCESEEK

One hint we do not use, but I see used in the wild is NOWAIT which is used in logic with retry constructs.

-Erik

jzabroski commented 6 years ago

lol @ size~1-week tag!!

I am happy to build a cross-reference of query hints across database engines and versions. Could take some time. Note sure I understand why QueryAnnotations should be used for this. I think if @smitpatel could demo NOLOCK hint with SQL Server, that would be helpful, but there are just so many additional user stories I would like to see capture:

erikj999 commented 6 years ago

It's going to take >1 week just to review this thread :).

I looked for a query hint matrix online thinking someone must have done this before, but couldn't find much, especially for lock control. And after looking across the database engines with providers in EF Core, I came away thinking it's not really possible to put a useful query hint feature into EF Core at the "Relational" level. Even NOLOCK isn't universally supported.

I think table/query hints should be implemented as provider-specific extensions e.g.:

      (from x in db.Things.With(LockHint.UpdLock) select new {x.this, x.that}).MaxDOP(4); 

...where the table hint With() and MaxDOP are extensions method in the SqlServer provider namespace. And use either with a non-SQLServer provider would throw a runtime error. I agree it's good to keep away from hints that require string arguments (like index names) because of the potential for injection attacks. Maybe index hints could be expressed in the model, but that might preclude provider-agnostic models.

BTW, I think the code that creates the T-SQL syntax for the SQL Server provider is in the Relational library -- not the SqlServer provider itself. I'm guessing by having it there, it serves as a default syntax generator for provider projects. But if query hints get promoted to the road map and can't be generalized usefully from a relational POV, that code would be likely forked, right?

jzabroski commented 6 years ago

I looked for a query hint matrix online thinking someone must have done this before, but couldn't find much, especially for lock control. And after looking across the database engines with providers in EF Core, I came away thinking it's not really possible to put a useful query hint feature into EF Core at the "Relational" level. Even NOLOCK isn't universally supported.

PostgreSQL

The PostgreSQL people seem to think very differently about query tuning than SQL Server engineers. The PostgreSQL people do not hint at the "relational level", but rather believe in tuning Planner Cost Constants. In effect, their approach is a lot like financial engineering and making a decision based on a "mark-to-market hurdle adjustment". They state a model factor, such as cpu_index_tuple_cost, and bump it up until it disables a decision tree in the Cost-Based Optimizer/Query Planner. For example, setting random_page_cost and the seq_page_cost both to 0 emulates a query where all the table data is already mmap'd (pinned to memory). In this sense, PostgreSQL simply thinks about locking differently than SQL Server:

  1. In the negative sense, all these _cost parameters are ultimately a function of page size. SQL Server DBA's use relation-level hints effectively to describe a formula: table_cost = table_pages page_cost where page_cost = Tuple(num_pages, read_write_skew, read_write_volatility) . This does produce interesting idioms where you can set random_page_cost and the seq_page_cost both to 0 when you are certain something should be hot in the cache, vs. not setting those if it's cold.
  2. In the positive sense, most places I've worked people just slap NOLOCK everywhere, so PostgreSQL is nicer in that it doesn't let engineers pollute the actual SQL with tall fescue shading the core business logic; most Microsoft engineers I've met don't know they can simply SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED for the same effect.

MySQL

Here is a link to MySQL's Optimization documentation, including specific links for 8.11.2 Table Locking Issues and 8.9 Controlling the Query Optimizer. MySQL seems fairly similar to SQL Server, especially 8.9.4 Index Hints:

tbl_name [[AS] alias] [index_hint_list]

index_hint_list:
    index_hint [index_hint] ...

index_hint:
    USE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
  | IGNORE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)
  | FORCE {INDEX|KEY}
      [FOR {JOIN|ORDER BY|GROUP BY}] (index_list)

index_list:
    index_name [, index_name] ...

Oracle

The latest documentation is available here: Oracle Release 18c: Influencing the Optimizer

Does anyone know how far back Oracle releases are supported right now?

Here is a table from Donald Burleson (famous Oracle guy), summarizing the various hints:

Oracle SQL Hints Tuning

Hints are expressed in special comments that start with +. /*+ <hint_expression> */ and --+ <hint_expression> are both valid ways to express a hint.

General hints:

Hint Meaning
ALL_ROWS Use the cost based approach for best throughput.
CHOOSE Default, if statistics are available will use cost, if not, rule.
FIRST_ROWS Use the cost based approach for best response time.
RULE Use rules based approach; this cancels any other hints specified for this statement.

Access Method Oracle Hints:

Hint Meaning
CLUSTER(table) This tells Oracle to do a cluster scan to access the table.
FULL(table) This tells the optimizer to do a full scan of the specified table.
HASH(table) Tells Oracle to explicitly choose the hash access method for the table.
HASH_AJ(table) Transforms a NOT IN subquery to a hash anti-join.
ROWID(table) Forces a rowid scan of the specified table.
INDEX(table [index]) Forces an index scan of the specified table using the specified index(s). If a list of indexes is specified, the optimizer chooses the one with the lowest cost. If no index is specified then the optimizer chooses the available index for the table with the lowest cost.
INDEX_ASC (table [index]) Same as INDEX only performs an ascending search of the index chosen, this is functionally identical to the INDEX statement.
INDEX_DESC(table [index]) Same as INDEX except performs a descending search. If more than one table is accessed, this is ignored.
INDEX_COMBINE(table index) Combines the bitmapped indexes on the table if the cost shows that to do so would give better performance.
INDEX_FFS(table index) Perform a fast full index scan rather than a table scan.
MERGE_AJ (table) Transforms a NOT IN subquery into a merge anti-join.
AND_EQUAL(table index index [index index index]) This hint causes a merge on several single column indexes. Two must be specified, five can be.
NL_AJ Transforms a NOT IN subquery into a NL anti-join (nested loop).
HASH_SJ(t1, t2) Inserted into the EXISTS subquery; This converts the subquery into a special type of hash join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once.
MERGE_SJ (t1, t2) Inserted into the EXISTS subquery; This converts the subquery into a special type of merge join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once.
NL_SJ Inserted into the EXISTS subquery; This converts the subquery into a special type of nested loop join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once.

Oracle Hints for join orders and transformations:

Hint Meaning
ORDERED This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it first may speed execution in a join.
STAR Forces the largest table to be joined last using a nested loops join on the index.
STAR_TRANSFORMATION Makes the optimizer use the best plan in which a start transformation is used.
FACT(table) When performing a star transformation use the specified table as a fact table.
NO_FACT(table) When performing a star transformation do not use the specified table as a fact table.
PUSH_SUBQ This causes nonmerged subqueries to be evaluated at the earliest possible point in the execution plan.
REWRITE(mview) If possible forces the query to use the specified materialized view, if no materialized view is specified, the system chooses what it calculates is the appropriate view.
NOREWRITE Turns off query rewrite for the statement, use it for when data returned must be concurrent and can't come from a materialized view.
USE_CONCAT Forces combined OR conditions and IN processing in the WHERE clause to be transformed into a compound query using the UNION ALL set operator.
NO_MERGE (table) This causes Oracle to join each specified table with another row source without a sort-merge join.
NO_EXPAND Prevents OR and IN processing expansion.
Oracle Hints for Join Operations:  
USE_HASH (table) This causes Oracle to join each specified table with another row source with a hash join.
USE_NL(table) This operation forces a nested loop using the specified table as the controlling table.
USE_MERGE(table,[table, - ]) This operation forces a sort-merge-join operation of the specified tables.
DRIVING_SITE The hint forces query execution to be done at a different site than that selected by Oracle. This hint can be used with either rule-based or cost-based optimization.
LEADING(table) The hint causes Oracle to use the specified table as the first table in the join order.
Oracle Hints for Parallel Operations:  
[NO]APPEND This specifies that data is to be or not to be appended to the end of a file rather than into existing free space. Use only with INSERT commands.
NOPARALLEL (table This specifies the operation is not to be done in parallel.
PARALLEL(table, instances) This specifies the operation is to be done in parallel.
PARALLEL_INDEX Allows parallelization of a fast full index scan on any index.

Other Oracle Hints

Hint Meaning
CACHE Specifies that the blocks retrieved for the table in the hint are placed at the most recently used end of the LRU list when the table is full table scanned.
NOCACHE Specifies that the blocks retrieved for the table in the hint are placed at the least recently used end of the LRU list when the table is full table scanned.
[NO]APPEND For insert operations will append (or not append) data at the HWM of table.
UNNEST Turns on the UNNEST_SUBQUERY option for statement if UNNEST_SUBQUERY parameter is set to FALSE.
NO_UNNEST Turns off the UNNEST_SUBQUERY option for statement if UNNEST_SUBQUERY parameter is set to TRUE.
PUSH_PRED Pushes the join predicate into the view.

Sybase Adaptive Enterprise 12.1

Adaptive Server Enterprise 12.5.1 > Performance and Tuning: Optimizer and Abstract Plans

CosmosDB

Performance tips for Azure Cosmos DB and .NET > SDK Usage > Tuning parallel queries for partitioned collections

DB2

This doesn't come from IBM documentation but from a Gold-level partner named Craig S. Mullins. I chose this over the docs because the docs are extremely full of IBM-speak and I don't speak IBM-speak. lol @ how complex/obtuse this is (ho-lee cow!): Influencing the DB2 Optimizer: Part 1 Influencing the DB2 Optimizer: Part 2 - Standard Methods Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements lol @ this, too:

At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT. Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT.

SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY
FROM    EMP
WHERE   EMPNO BETWEEN ‘000020’ AND ‘000350’
AND     (WORKDEPT > ‘A01’ OR 0 = 1);

The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen.

There are other tweaks that can be used to dissuade the DB2 optimizer from choosing a specific index, such as multiplying or dividing by 1, adding or subtracting 0, or appending an empty string to a character column.

Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS Influencing the DB2 Optimizer: Part 5 - Changing DB2 Catalog Statistics Influencing the DB2 Optimizer: Part 6 - Using Optimization Hints

jzabroski commented 6 years ago

@erikj999 Given the above, how would you template a "query hint matrix"? You mentioned one category: lock control.

I have created a repo on GitHub to place such a matrix: https://github.com/jzabroski/SqlQueryHintMatrix Here is the list of known EF Providers: https://docs.microsoft.com/en-us/ef/core/providers/

erikj999 commented 6 years ago

@jzabroski Well, that DB2 trick to disable an index is probably the only query hint in the bunch you can today execute in EF :). Seriously, that's crazy.

I'm still skeptical a rich set of query hints can be defined at a general level. But I've been wrong before and I'll go through the links you've provided to get a better sense of it.

But what if we started with a simple enumeration for locks -- NoLock, ShareLock, UpdateLock -- which could be applied at an entity level (both FROM and JOIN) or at the query level?

For index hints, we could start with just one extension method UseIndex(object index) which is also applied to the entity. I think the index would need to be codified in the model somehow so we can reference it without having to pass its name as a string (to avoid injection attacks).

I think those two ideas could be safely defined in EntityFrameworkCore.Relational. Providers could ignore hints that don't apply to the specific engine (like NoLock on Oracle) or throw a runtime error if so configured.

roji commented 6 years ago

From a cursory look query hints seem sufficiently different across the different databases, that it doesn't necessarily make sense to include anything specific in relational. EF Core should definitely provide the infrastructure/plumbing to allow arbitrary hints to be defined and flowed to the provider, but I really see little value in defining a locking enum, which would be irrelevant for some (most?) databases, and not contain the right range (or terminology) of options for others...

There just seems to be too much diversity here, and there's enough infrastructure work needed in any case just to make hints possible (plus some discussions, such as somehow applying hints to individual components of the query rather than globally).

ralmsdeveloper commented 6 years ago

@roji Looking at this I found it very pertinent! https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion

roji commented 6 years ago

@ralmsdeveloper to be clear, I'm very much for query hints and think they're relevant to PostgreSQL, but I'm much less sure about a set of"standard" hints which live in EFCore.Relational and are supposedly valid for all/most databases...

jzabroski commented 6 years ago

@roji

I agree! If we push it into the Relational library, then it becomes impossible to detect at compile-time any hints that will break when changing database targets. It should be possible to statically link to hints, and guarantee that if you include a Hints library for your provider, you get relative safety that those hints are known to the provider (the provider may fail to generate valid SQL, but at least "blatantly invalid" won't happen.)

This also addresses the fact that Oracle puts hints in "hint comments", whereas Microsoft extends the T-SQL grammar. If you move from Oracle to Microsoft via raw SQL, then your code will "just work". However, if you move from Oracle to Microsoft via EFCore, then your code might not work.

One common ground here is to perhaps do the following:

jzabroski commented 6 years ago

@ralmsdeveloper This is off-topic, but basically, most databases existing in the modern world started off as research projects of a Stanford guy named Michael Stonebraker (look him up). Including PostgreSQL and Ingres - and Ingres co-developers later formed Sybase which eventually sold a co-license of the source code to Microsoft and eventually became Microsoft SQL Server 6.5 when Microsoft bought it out and then forked it.

Most of the concepts in relational databases and their individual philosophies stem from his "family tree" of projects and Ph.D. students. The reason some databases don't have lock hints is because lock hints don't really make a ton of sense to a Cost-Based Optimizer in a database configured for multi-version concurrency control (MVCC). MVCC by its very nature is intended to be a lock-free design, so hinting an UPDLOCK or NOLOCK is silly to a system that doesn't really "know" about locking. PostgreSQL in general does not have the contention issues Microsoft SQL Server has; SQL Server tends to "lock up" with a sneeze, but thankfully has a ton of tools for DBAs to reduce locking at the expense of maintainability. Yet, Microsoft SQL Server has a pretty amazing Transactional DDL that is a gift from the Gods, and the ability to control locking on a per-table level combined with Transactional DDL makes upgrading systems under high loads not so painful.

If you want to learn more and talk further, ping me offline - my first and last name at gmail dot com. I love this academic stuff and am a bit of a computer science historian.

erikj999 commented 6 years ago

From a cursory look query hints seem sufficiently different across the different databases, that it doesn't necessarily make sense to include anything specific in relational.

@roji That was my conclusion as well. I the enumeration idea was a thought exercise (and I almost didn't include ShareLock). Obviously, pushing all this to the provider level means using a hint of any sort causes the query to be statically bound to a specific provider. I wasn't sure how palatable that would be to the EF Core community.

smitpatel commented 6 years ago

I agree! If we push it into the Relational library, then it becomes impossible to detect at compile-time any hints that will break when changing database targets.

I don't think that is an issue and we need add all those behavior (Fail/Log/None etc.). At present, if you specify raw SQL in FromSql and then change database provider then we don't do anything different. If the SQL does not work in the second database, it fails. While it is possible to use different database providers with EF Core, we don't provide fully database agnostic model. (FromSql for example). Further, hints may have option to ignore/fail but FromSql got no choice like that, and has to fail. If we already have one blocking behavior then there is no need to work differently for other API. In the essence, if you are using "provider" specific things in your application, then you cannot change provider and expect everything to work.

smitpatel commented 6 years ago

It's going to take >1 week just to review this thread :).

True. :trollface:

jzabroski commented 6 years ago

@smitpatel I don't really agree with your strawman argument. That's like saying because the CLR allows unsafe IO then we shouldn't allow safe IO. We have safety when we can have it - and "un-safety" when we can't reasonably expect it.

In the essence, if you are using "provider" specific things in your application, then you cannot change provider and expect everything to work.

I agree here. What I'm suggesting is that if Providers want (and believe Provider writers should be encouraged to write it the way @erikj999 suggested) then they can write it as extension methods and therefore assist people when they want to write portable code.

Look, if you don't do this I'll just have a class with a bunch of static readonly strings that surface such an API. And that's kind of ugly, because then I have to chase down people who wrote a one off string and drill it in people's heads to use the static readonly strings.

There's a broad continuum of ok to great solutions here.

ralmsdeveloper commented 6 years ago

As already quoted by @erikj999 I have also been analyzing and researching something universal for the core "EFCore.Relational", I came to the conclusion that this is practically impossible, hints are very peculiar to each database.

So I believe that each provider must write its own extensions to meet the specific needs of users, in SQL Server I use NOLOCK and UPDLOCK a lot.

I've been coding something for my provider (this may not be the desired end result).

@smitpatel do you have any suggestions on what I did? (Please do not consider my tests 🙈, I'm working to leave 100% aligned with EF Core) https://github.com/ralmsdeveloper/EntityFrameworkCore.FirebirdSQL/commit/db46134f8e947f09736abd84e2a4b74ad2ac1249

https://github.com/ralmsdeveloper/EntityFrameworkCore.FirebirdSQL/blob/db46134f8e947f09736abd84e2a4b74ad2ac1249/EFCore.FirebirdSql/Query/Sql/Internal/FbQuerySqlGenerator.cs#L81-L92

@roji do you have any regrets too.

@roji you also have more thoughts on this?

smitpatel commented 6 years ago

When operations are not working for most cases, then there is no point of providing safety for subset of them. It is pit of failure and confuses users more than helping out. Some magic is fine, but a lot of magic and you are left with userbase who cannot figure out what works and what does not.

jzabroski commented 6 years ago

@smitpatel I think you need specific examples. You're just generalizing.

I gave a specific use case I like, and even explained how I might code around it if there is no static typing:

public static class QueryHints
{
  public static string WithNoLock
  {
    get { return "WITH (NOLOCK)"; }
  }
}

Now, see how @ralmsdeveloper implemented it with a method.

I guess your opposition would be some syntax like this:

var foo = (from p in dbContext.Set<Person>().AddAnnotation(QueryHints.WithNoLock)).ToList()

is somehow superior to this:

var foo = (from p in dbContext.Set<Person>().ToFbHint().WithNoLock()).ToList()

and that the "somehow" rationale is that the core API is in Relational.

Note: It is pretty unfair to call this "magic". Let's not romanticize a purely objective measure like type safety and Open-Closed Principle.

On subjective measures, rather than assume what users find confusing, please explain how it could possibly be confusing to have the following user story:

I want to switch from Firebird to MSSQL in a large code base and have Visual Studio underline in red ink the breaking API changes for hints.

erikj999 commented 6 years ago

@jzabroski

I want to switch from Firebird to MSSQL in a large code base and have Visual Studio underline in red ink the breaking API changes for hints.

This is the downside to having table/query hints implemented as extension methods via a provider-specific namespace. Attaching a query hint effectively kills the ability for that query to be portable to other providers. Worse, the only way Visual Studio would know what code to red-underline is if the developer replaces "using ...Firebird.QueryHints" with "using ...MSSQL.QueryHints" or some such.

I still think this is the best approach for EF Core, though. Developers shouldn't be required to use "FromSql" for something so essential and ubiquitous as basic lock management. Having said that, other ORMs haven't solved this problem either.

I'm curious what opinion @rowanmiller, @divega, or anyone else has on this. Given there is no way to abstract query hints from database engines, the only way forward is through provider-specific extension methods. But using such an extension means in a query means execution is limited to a specific provider. Is that tenable for EF Core, which in other respects is provider agnostic?

I guess another way forward is with an extensibility scheme like the EF 6 wrapping provider which at least lets developers build in their own query extensions. But it isn't easy.

roji commented 6 years ago

@erikj999 just to mention that this is the pattern already used for database-specific customization in the model/migratons: you use a provider-specific extensions method on on the different model builders (e.g. ForSqlServerIsMemoryOptimized()).

@jzabroski yes, if you switch providers you get red squigglies, but switching databases is a relatively rare and huge change to your program. Think about differences in type mapping, in client/server evaluation changes (i.e. what used to be server-evaluated is now suddenly client-evaluated), and all the other subtle difference between databases. Honestly, a red squigglie (which you can easily remove) is going to be the least of your concerns...

jzabroski commented 6 years ago

@roji I thought about all that, and I think I still want as much help as possible. Thinking about differences in type mapping, client/server evaluation, etc. are outside the scope of this issue. User stories are not intended to solve every single problem a developer encounters. User stories can take on many forms, but one of the most beneficial is the User-Action-Benefit triple. The "As a, I want, so that" user story template is pretty common, because it helps people identify with strangers. Identifying with strangers is one of the things great API designers like Joshua Bloch say is the hardest part about designing APIs. I won't drag on about this, though. If you have any further comments let's go offline or use gitter.im to discuss.

roji commented 6 years ago

@jzabroski it's an interesting and healthy discussion, I don't really see a reason to take it offline...

My point comparing to other areas where provider-specific instructions are implemented as strongly-typed extension methods, was that a product-level decision has already been made: if you want something specific to SQL Server, you use a strongly-typed method from the SQL Server provider assembly. This doesn't mean we can't be flexible, but if we start introducing provider-specific query hints via weakly-typed strings (or some other mechanism) we provide a very inconsistent API experience to users. IMHO one has to look at the larger context of how the product already looks like.

Your argument against this way of doing things is that migrating between databases creates red squigglies. In my opinion that is actually a benefit. It seems quite dangerous for query hints to silently stop having any effect, which is what would happen if they're weakly-typed: your program may indeed depend on behavior set up by some weakly-typed hint, that doesn't exist or is spelled differently in your new provider. It seems like a much better user story to have the compiler guide them through the places where they're using provider-specific code (e.g. hints), encouraging them to think about what the migration means in each context.

Also, since your argument concerns changing databases, it seems very relevant to see other issues that happen as part of that process (type mapping, client/server evaluation...). Since the red squigglies here are very likely to be a small hurdle compared to the others, and since it's easy to deal with - even if tedious - simply because it's a compile-time error (rather than a subtle runtime error), it just doesn't doesn't seem to be worth losing the benefits of strongly-typed hints.

Finally, I'm not sure in which direction your user story description is supposed to guide us :) I spend a lot of my time trying to listen and think about which APIs users want (but also which APIs are good for them despite what they say they want). My opinion is just different from yours...

jzabroski commented 6 years ago

@roji uh... re-read the thread. I am pro red squiggles. I described my user story of wanting red squiggles. Smit seemed to be against it.

But, good to know you're on my side.

(At this point, I'm scratching my head wondering if I had a severe typo or poor explanation on the order of "A panda, eats, shoots, and leaves.")

On Wed, Aug 22, 2018, 5:19 PM Shay Rojansky notifications@github.com wrote:

@jzabroski https://github.com/jzabroski it's an interesting and healthy discussion, I don't really see a reason to take it offline...

My point comparing to other areas where provider-specific instructions are implemented as strongly-typed extension methods, was that a product-level decision has already been made: if you want something specific to SQL Server, you use a strongly-typed method from the SQL Server provider assembly. This doesn't mean we can't be flexible, but if we start introducing provider-specific query hints via weakly-typed strings (or some other mechanism) we provide a very inconsistent API experience to users. IMHO one has to look at the larger context of how the product already looks like.

Your argument against this way of doing things is that migrating between databases creates red squigglies. In my opinion that is actually a benefit. It seems quite dangerous for query hints to silently stop having any effect, which is what would happen if they're weakly-typed: your program may indeed depend on behavior set up by some weakly-typed hint, that doesn't exist or is spelled differently in your new provider. It seems like a much better user story to have the compiler guide them through the places where they're using provider-specific code (e.g. hints), encouraging them to think about what the migration means in each context.

Also, since your argument concerns changing databases, it seems very relevant to see other issues that happen as part of that process (type mapping, client/server evaluation...). Since the red squigglies here are very likely to be a small hurdle compared to the others, and since it's easy to deal with - even if tedious - simply because it's a compile-time error (rather than a subtle runtime error), it just doesn't doesn't seem to be worth losing the benefits of strongly-typed hints.

Finally, I'm not sure in which direction your user story description is supposed to guide us :) I spend a lot of my time trying to listen and think about which APIs users want (but also which APIs are good for them despite what they say they want). My opinion is just different from yours...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-415183751, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_QXDBZxZgCtA5YysDlGcLctACGYrks5uTcrFgaJpZM4KRSfR .