Open rowanmiller opened 7 years ago
While the idea of integrating arbitrary provider-specific hints in queries seems useful, treating this as some raw SQL to be tacked at the end of the query may be too limited. A specific provider's SQL dialect may require the additional SQL to be integrated somewhere in the middle, etc...
Some arbitrary annotations on the query itself which would be picked up by the provider-specific query SQL generator may cover more cases.
@roji yeah I wasn't intending to limit it to appending text to the end of the query - I removed the word "append" from the title 😄 .
Ok great :)
Note from triage: we need to investigate what commonality exists here across relational database systems. For example, where can hints be attached? Is there non-SQL Server value in a query-hint only (i.e. always at the end of the query) solution?
/cc @roji @caleblloyd @ErikEJ
Here's the PostgreSQL documentation page on querying.
I'm not sure how many things it has which can be tacked on without impacting the shape of the results (in which case they'd be beyond the scope of this issue). Some ideas that come to mind:
ORDER BY ... NULLS { FIRST | LAST }
, impacting the sort order.FOR UPDATE
). Users would then be able to do stuff like issue updates in raw SQL, keeping the rows locked until the end of the transaction. It's nice to be able to express the query (i.e. set of rows to lock) with EF Core LINQ.Seems like ideally providers would be able to define an extension method on the IQueryable (like AsNoTracking()
), which would be picked up by the SQL generator.
Design Notes:
Since query/table hints are a lot provider specific, there is no good relational level API (yet). Enums for options will have issue with provider specific code in relational. String version cannot be used as is since it would be append to SQL query. So it is left upto each provider to add individual methods (or 1 common method) for users to specify hints. Details of implementation is upto the provider writers. In turn relational level will provide infrastructure for providers. One easy idea is to use QueryAnnotations (how query tagging works at present), to flow information from method to SelectExpression. It could be generalized (and combined with Tagging implementation), for select expression to have access to query annotations. Provider would give query annotations from their methods, which will reach select expression through relational layer. And while printing out SQL providers could write appropriate SQL based on the annotations. This would allow providers to implement more than just table or query level hints. (e.g. Null ordering)
Query level hints are once per query so it can be applied to whole query. Table level hints are slightly more complicated since user may want to apply hint to one table but not the other. The API may need that granularity based on that. Further API needs to consider what would happen for the tables which are indirectly brought in through auto-include of owned entities or nav expansion. Also if API takes the type to fetch table then owned types could cause question which owned type it would apply to. General initial approach would be all or nothing. Using API to say No Lock
on tables will apply to all tables in the query, however they are brought into the SQL. Based on customer feedback more granular API can be determined.
Changes to relational level remain pre-requisite for provider level work but the issue remains in backlog for now. Community contributions appreciated.
cc: @ralmsdeveloper
Another use case: specifying per-query collation. PostgreSQL allows the following on an expression:
SELECT a < b COLLATE "de_DE" FROM test1;
or on a clumn:
SELECT a COLLATE "de_DE" < b FROM test1;
@smitpatel agree that the way forward here is by allowing annotations to be specified by the user in the query, and for those annotations to flow to the provider's query SQL generator. The question is indeed granularity and which elements of the query expression tree can be thus annotated.
Thoughts: What @roji spoke makes sense to me, I believe we could create a method for annotation on EFCore.Ralational to meet the following needs, following SQL Server as the basis, but the providers would be responsible for consolidating the end information .
We would have to have one more property in TableExpression (public virtual string Hint {get;set;}).
Remember that when using the annotation, this would be replicated in the "Include" tables: x.Include(y => y.Table)
Or have one more parameter in the extension method: "NoInclude"
public static IQueryable<TEntity> With<TEntity>(
this IQueryable<TEntity> source,
string hint,
string optionalParameter = null);
var query = _db
.Blogs
.With("NOLOCK", "INDEX(myIndex)") // Second parameter optional
.ToList();
SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK, INDEX(myIndex))
--or
SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK)
This would prevent user typing error.
PAGLOCK, NOLOCK, READCOMMITTEDLOCK, ROWLOCK, TABLOCK ou TABLOCKX.
var query = _db
.Blogs
.With(Hints.NOLOCK)
.ToList();
For the @roji example: https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-397403491
I believe we can do this in EF.Functions.
Or just:
public static IQueryable<TEntity> Hint<TEntity>(
this IQueryable<TEntity> source,
string hint,
bool repeatForInclude = true);
var query = _db
.Blogs
.Hint("WITH (NOLOCK)")
.ToList();
This is left under the responsibility of the user.
In my thinking I believe that this also becomes more flexible, the user can theoretically use all available hints, since initially this would be designed only for queries!
@ralmsdeveloper - That is what we decided explicitly not to do it. Unless there is any common query hint across all providers.
Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.
Relational layer will just provide the storage facility in terms of query annotations. So provider methods can add query annotations, (whatever shape), and those annotations will be flowed to SQL generator by relational, where provider can look at annotations and determine what to write in SQL. Further it also avoids breaking TableExpression since there is no need to add anything like Hint
in any expression. "Hints" will be available globally to SelectExpression while printing, providers can determine where each hint will be printed.
In addition to what @smitpatel wrote above, a simple API to tack strings inside the query is also a pretty bad user experience - the user would need to have knowledge about specific strings and where they go inside the query. It's much better to have provider-defined extension methods which add structured annotations, which are then picked up by the query SQL generator and rendered in SQL. This is also in line with the current way of specifying provider-specific settings on the model, which get translated to SQL by the migrations generator.
@smitpatel if I understand correctly the initial plan is to only allow annotations on the SelectExpression? While it's true this should cover most basic cases, it's a good idea to at least consider what the more flexible/advanced version of this would look like. As I've written above there are viable cases where a column expression, or indeed even a comparison expression could be annotated (see the collation example). I'm not saying these should be supported by day one (in fact I've received no actual request for anything of the sort) but it's good to keep it in mind.
I get it
Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.
Like I said before, it's just thoughts.
My view was just in annotation to table, so I talked about creating another property, why providers could override "VisitTable
"
However, if we can retrieve anottations from SelectExpression
would be enough, so the writers of the providers can simply do what they think best, to meet the more specific needs of each provider.
@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.
For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType
kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.
@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.
I understand, but unless I'm mistaken that would make it impossible (or very awkward) to apply different hints to different parts of the query, right? For example, if I want want ORDER BY
to sort nulls first, and another to sort nulls last, it would be very difficult to express that, won't it?
For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.
Just to be extra clear - this seems to be a pretty rare requirement and I'm fairly certain 95% of needs will be met by specifying an annotation "globally" at the query level.
For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.
EF.Functions.StoreType()
does seem like a simple case that can probably be handled with a simple method call translator (although maybe consider a shorter name such as EF.StoreType()
?). In the more general case I'm not sure exactly what you're asking with respects to granularity... collation can be specified on each single column, or on a comparison expression involving two columns.
It indeed difficult to design an API for such cases. Table hints is also one such case. What if the user wants to apply No Lock
on 1 table and not on the other. Due to difficulty in arriving at design of it, we differed to feedback from users. The No Lock
at query level could/should/would append No Lock
to each table appearing in the query. Order By falls under same umbrella. Especially if user wants to specify nulls first/last differently on different order bys then perhaps a better options would be overload of OrderBy
(just throwing out some ideas)`. Of course how to flow that information is still unknown but certainly cannot be query level annotations.
As you said, 95% cases should be covered by "global" level. We can look into the API and data flow, once 5% comes up with feature request to add "granular" control.
By "granularity", what I mean is, if user needs to pin point the part of query it needs to applied (like particular column or comparison) then query level "hints" (methods) are not suitable for that API. Hence out of the scope of this issue.
If you have ideas for API for things which cannot be specified at "global" level, then please file a new issue with API ideas, we can explore how to flow information to SQL.
Because my English is bad, I prefer to write short texts.
@smitpatel then look at this here: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql?view=sql-server-2017
Not everyone has this need, but to make a provider much more powerful and flexible, some of these HINTS should be supported at least: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-2017#syntax
It's worth mentioning that this is a totally isolated SQL Server thing.
Actually now would be to know what kind of API would be global, since the cases are totally different for SQL Server and PostgreSQL.
Then we would have to have two methods:
I believe that the utility method for the PostgreSQL Provider would be only WithLock, which would translate something like:
SELECT * FROM BAR FOR UPDATE
When I thought about using ENUM I was thinking of supporting all this in one way:
| FORCESCAN
| FORCESEEK
| HOLDLOCK
| NOLOCK
| NOWAIT
| PAGLOCK
| READCOMMITTED
| READCOMMITTEDLOCK
| READPAST
| READUNCOMMITTED
| REPEATABLEREAD
| ROCKLOCK
| SERIALIZABLE
| SNAPSHOT
| TABLOCK
| TABLOCKX
| UPDLOCK
| XLOCK
@ralmsdeveloper - Yes, there can be methods like WithNoLock
& WithLock
but it would be in SqlServer provider. Same way Postgre can add more. Unless we identify hints which are "uniform" across multiple providers like SqlServer, PostgreSQL, SQLite, MySQL, Oracle etc, adding a relational surface may cause more burden on provider writers. (Also there would not be a good default for SQL which we need and use in DefaultQuerySqlGenerator.
@smitpatel & @roji, sorry for my poor communication. But I'm working hard to communicate better with you.
@smitpatel Anyway, what you said was very clear to me now, that would be the way.
As a suggestion, there could be a parameter in these methods, as I mentioned above. repeatForInclude
for the same annotation if replicate the remaining query tables (INNER JOIN / SUBQUERY)
Looks like SQLite has just one hint: INDEXED BY
And NOT INDEXED
...for all those times you wish the query planner would just stop using all those pesky indexes. 😖⁉
Would this also be supported under this, or is it another issue, or is there a current way to implement it?
WITH CHANGE_TRACKING_CONTEXT (context)
Just to reiterate the importance of something like EF.StoreType(value, storeType)
, we've recently seen lots of issues where there's a need to explicitly specify store types when the same CLR type can be mapped to more than one store type (see bottom of issue for some examples).
I just realized that this is a bit more complicated than a simple method translator, since we need to do it for literals/constants as well. In other words, when I specify a literal in my query, EF Core looks up the mapping for that CLR type and uses that to render a literal SQL representation - but we need to be able to specify another type. This choice should affect the mapping that is selected, rather than add a server-side casting operator.
I'm not sure that this problem (both the parameter version and the literal version) is best tracked by this issue, which is more about flowing annotations to the provider's SQL generator. It's also more urgent than general-purpose query SQL hints. Maybe we should split it off and handle separately?
List of scenarios where this is needed:
macaddr
type and a newer 6- or 8-byte macaddr8
type, both mapped to CLR PhysicalAddress
. PhysicalAddress
is mapped to the older macaddr
by default (backwards compat), and working with the newer one has issues.DateTime
is mapped to PostgreSQL timestamp
but also to date
(since .NET has no date-only type). This again causes some severe issues when trying to work with the non-default mapping (i.e. date
). See https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/493#issuecomment-401086741 (fortunately NodaTime can be used to work around this).@roji Thanks for the input. We're tracking this feature as #4978. I have removed that issue from the backlog so we can discuss priority in triage.
Oh sorry, I forgot that #4978 already existed. Thanks for giving this your attention.
Hi, I'm not sure from the discussion where the current thinking on QUERY/TABLE hints has landed and whether a solution is in mind. IMO, the biggest need for many developers is simply control of locking during queries (i.e. table hints). Just that one feature would be really welcome.
For EF 6, we created a With() extension -- just like Rafael described (except with an enum to avoid injection issues). In our implementation, the With() call has to come after an entity collection so we know how to translate the expression and correctly associate the injected WITH statement to the right FROM or JOIN. We also support accepting an index hint, but the feature never gets used (and it's a potential injection issue because the caller passes the index name as a string).
Anyway, the lack of any way to set a table hint for locking in EF Core is our main blocker to migrating our entire commercial ERP app to .Net Core. So I'm super interested in how this issue progresses. Thanks,
-Erik
@erikj999 great to hear from you! I have a couple of questions:
Hi @divega, great to hear from you, too. Here's what worries me about propagating a lock hint throughout a query. With READ COMMITTED SNAPSHOT mode, which we use and I think is the default for SQL Azure and given the query:
BEGIN TRANSACTION -- Otherwise nothing gets locked SELECT c.Company, c.CustNum, c.Name FROM Customer AS [c] WITH(UPDLOCK) JOIN Company AS [c1] ON (c.Company = c1.Company) WHERE c1.Country = N'USA'
SQL will hold update intent locks on the Customer table, just as the hint prescribes, but won't hold any locks on the joined Company table. So if a Linq statement includes a lock hint and EF Core propagates that lock hint across all tables in the query, that behavior will be inconsistent with SQL and would cause undesired locking.
IMO, if we only get one table hint to specify in an EF query, then it should only be applied to the FROM table only. It would be better if we could have a table hint attached to any entity in the query, where it would be applied correctly to the FROM or JOIN when the T-SQL is expressed, to make the lock behavior more explicit. Hope that makes sense.
The lock hints we use are:
NOLOCK (does nothing but make us feel good, considering we use Read-Committed Snapshot) HOLDLOCK READPAST UPDLOCK (We use this one a ton) FORCESEEK
One hint we do not use, but I see used in the wild is NOWAIT which is used in logic with retry constructs.
-Erik
lol @ size~1-week tag!!
I am happy to build a cross-reference of query hints across database engines and versions. Could take some time. Note sure I understand why QueryAnnotations should be used for this. I think if @smitpatel could demo NOLOCK hint with SQL Server, that would be helpful, but there are just so many additional user stories I would like to see capture:
UseCoveringIndex<Entity>()
with the fields I wanted covered. I could then use reflection in an integration test with the generated database to ensure there exists a covering index for those columns.It's going to take >1 week just to review this thread :).
I looked for a query hint matrix online thinking someone must have done this before, but couldn't find much, especially for lock control. And after looking across the database engines with providers in EF Core, I came away thinking it's not really possible to put a useful query hint feature into EF Core at the "Relational" level. Even NOLOCK isn't universally supported.
I think table/query hints should be implemented as provider-specific extensions e.g.:
(from x in db.Things.With(LockHint.UpdLock) select new {x.this, x.that}).MaxDOP(4);
...where the table hint With() and MaxDOP are extensions method in the SqlServer provider namespace. And use either with a non-SQLServer provider would throw a runtime error. I agree it's good to keep away from hints that require string arguments (like index names) because of the potential for injection attacks. Maybe index hints could be expressed in the model, but that might preclude provider-agnostic models.
BTW, I think the code that creates the T-SQL syntax for the SQL Server provider is in the Relational library -- not the SqlServer provider itself. I'm guessing by having it there, it serves as a default syntax generator for provider projects. But if query hints get promoted to the road map and can't be generalized usefully from a relational POV, that code would be likely forked, right?
I looked for a query hint matrix online thinking someone must have done this before, but couldn't find much, especially for lock control. And after looking across the database engines with providers in EF Core, I came away thinking it's not really possible to put a useful query hint feature into EF Core at the "Relational" level. Even NOLOCK isn't universally supported.
The PostgreSQL people seem to think very differently about query tuning than SQL Server engineers. The PostgreSQL people do not hint at the "relational level", but rather believe in tuning Planner Cost Constants. In effect, their approach is a lot like financial engineering and making a decision based on a "mark-to-market hurdle adjustment". They state a model factor, such as cpu_index_tuple_cost, and bump it up until it disables a decision tree in the Cost-Based Optimizer/Query Planner. For example, setting random_page_cost and the seq_page_cost both to 0 emulates a query where all the table data is already mmap'd (pinned to memory). In this sense, PostgreSQL simply thinks about locking differently than SQL Server:
Here is a link to MySQL's Optimization documentation, including specific links for 8.11.2 Table Locking Issues and 8.9 Controlling the Query Optimizer. MySQL seems fairly similar to SQL Server, especially 8.9.4 Index Hints:
tbl_name [[AS] alias] [index_hint_list] index_hint_list: index_hint [index_hint] ... index_hint: USE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list) | FORCE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list) index_list: index_name [, index_name] ...
The latest documentation is available here: Oracle Release 18c: Influencing the Optimizer
Does anyone know how far back Oracle releases are supported right now?
Here is a table from Donald Burleson (famous Oracle guy), summarizing the various hints:
Hints are expressed in special comments that start with +.
/*+ <hint_expression> */
and
--+ <hint_expression>
are both valid ways to express a hint.
Hint | Meaning |
---|---|
ALL_ROWS | Use the cost based approach for best throughput. |
CHOOSE | Default, if statistics are available will use cost, if not, rule. |
FIRST_ROWS | Use the cost based approach for best response time. |
RULE | Use rules based approach; this cancels any other hints specified for this statement. |
Hint | Meaning |
---|---|
CLUSTER(table) | This tells Oracle to do a cluster scan to access the table. |
FULL(table) | This tells the optimizer to do a full scan of the specified table. |
HASH(table) | Tells Oracle to explicitly choose the hash access method for the table. |
HASH_AJ(table) | Transforms a NOT IN subquery to a hash anti-join. |
ROWID(table) | Forces a rowid scan of the specified table. |
INDEX(table [index]) | Forces an index scan of the specified table using the specified index(s). If a list of indexes is specified, the optimizer chooses the one with the lowest cost. If no index is specified then the optimizer chooses the available index for the table with the lowest cost. |
INDEX_ASC (table [index]) | Same as INDEX only performs an ascending search of the index chosen, this is functionally identical to the INDEX statement. |
INDEX_DESC(table [index]) | Same as INDEX except performs a descending search. If more than one table is accessed, this is ignored. |
INDEX_COMBINE(table index) | Combines the bitmapped indexes on the table if the cost shows that to do so would give better performance. |
INDEX_FFS(table index) | Perform a fast full index scan rather than a table scan. |
MERGE_AJ (table) | Transforms a NOT IN subquery into a merge anti-join. |
AND_EQUAL(table index index [index index index]) | This hint causes a merge on several single column indexes. Two must be specified, five can be. |
NL_AJ | Transforms a NOT IN subquery into a NL anti-join (nested loop). |
HASH_SJ(t1, t2) | Inserted into the EXISTS subquery; This converts the subquery into a special type of hash join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once. |
MERGE_SJ (t1, t2) | Inserted into the EXISTS subquery; This converts the subquery into a special type of merge join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once. |
NL_SJ | Inserted into the EXISTS subquery; This converts the subquery into a special type of nested loop join between t1 and t2 that preserves the semantics of the subquery. That is, even if there is more than one matching row in t2 for a row in t1, the row in t1 is returned only once. |
Hint | Meaning |
---|---|
ORDERED | This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it first may speed execution in a join. |
STAR | Forces the largest table to be joined last using a nested loops join on the index. |
STAR_TRANSFORMATION | Makes the optimizer use the best plan in which a start transformation is used. |
FACT(table) | When performing a star transformation use the specified table as a fact table. |
NO_FACT(table) | When performing a star transformation do not use the specified table as a fact table. |
PUSH_SUBQ | This causes nonmerged subqueries to be evaluated at the earliest possible point in the execution plan. |
REWRITE(mview) | If possible forces the query to use the specified materialized view, if no materialized view is specified, the system chooses what it calculates is the appropriate view. |
NOREWRITE | Turns off query rewrite for the statement, use it for when data returned must be concurrent and can't come from a materialized view. |
USE_CONCAT | Forces combined OR conditions and IN processing in the WHERE clause to be transformed into a compound query using the UNION ALL set operator. |
NO_MERGE (table) | This causes Oracle to join each specified table with another row source without a sort-merge join. |
NO_EXPAND | Prevents OR and IN processing expansion. |
Oracle Hints for Join Operations: | |
USE_HASH (table) | This causes Oracle to join each specified table with another row source with a hash join. |
USE_NL(table) | This operation forces a nested loop using the specified table as the controlling table. |
USE_MERGE(table,[table, - ]) | This operation forces a sort-merge-join operation of the specified tables. |
DRIVING_SITE | The hint forces query execution to be done at a different site than that selected by Oracle. This hint can be used with either rule-based or cost-based optimization. |
LEADING(table) | The hint causes Oracle to use the specified table as the first table in the join order. |
Oracle Hints for Parallel Operations: | |
[NO]APPEND | This specifies that data is to be or not to be appended to the end of a file rather than into existing free space. Use only with INSERT commands. |
NOPARALLEL (table | This specifies the operation is not to be done in parallel. |
PARALLEL(table, instances) | This specifies the operation is to be done in parallel. |
PARALLEL_INDEX | Allows parallelization of a fast full index scan on any index. |
Hint | Meaning |
---|---|
CACHE | Specifies that the blocks retrieved for the table in the hint are placed at the most recently used end of the LRU list when the table is full table scanned. |
NOCACHE | Specifies that the blocks retrieved for the table in the hint are placed at the least recently used end of the LRU list when the table is full table scanned. |
[NO]APPEND | For insert operations will append (or not append) data at the HWM of table. |
UNNEST | Turns on the UNNEST_SUBQUERY option for statement if UNNEST_SUBQUERY parameter is set to FALSE. |
NO_UNNEST | Turns off the UNNEST_SUBQUERY option for statement if UNNEST_SUBQUERY parameter is set to TRUE. |
PUSH_PRED | Pushes the join predicate into the view. |
Adaptive Server Enterprise 12.5.1 > Performance and Tuning: Optimizer and Abstract Plans
This doesn't come from IBM documentation but from a Gold-level partner named Craig S. Mullins. I chose this over the docs because the docs are extremely full of IBM-speak and I don't speak IBM-speak. lol @ how complex/obtuse this is (ho-lee cow!): Influencing the DB2 Optimizer: Part 1 Influencing the DB2 Optimizer: Part 2 - Standard Methods Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements lol @ this, too:
At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT. Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT.
SELECT EMPNO, WORKDEPT, EDLEVEL, SALARY FROM EMP WHERE EMPNO BETWEEN ‘000020’ AND ‘000350’ AND (WORKDEPT > ‘A01’ OR 0 = 1);
The
OR 0 = 1
clause does not change the results of the query, but it can change the access path chosen.There are other tweaks that can be used to dissuade the DB2 optimizer from choosing a specific index, such as multiplying or dividing by 1, adding or subtracting 0, or appending an empty string to a character column.
Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS Influencing the DB2 Optimizer: Part 5 - Changing DB2 Catalog Statistics Influencing the DB2 Optimizer: Part 6 - Using Optimization Hints
@erikj999 Given the above, how would you template a "query hint matrix"? You mentioned one category: lock control.
I have created a repo on GitHub to place such a matrix: https://github.com/jzabroski/SqlQueryHintMatrix Here is the list of known EF Providers: https://docs.microsoft.com/en-us/ef/core/providers/
@jzabroski Well, that DB2 trick to disable an index is probably the only query hint in the bunch you can today execute in EF :). Seriously, that's crazy.
I'm still skeptical a rich set of query hints can be defined at a general level. But I've been wrong before and I'll go through the links you've provided to get a better sense of it.
But what if we started with a simple enumeration for locks -- NoLock, ShareLock, UpdateLock -- which could be applied at an entity level (both FROM and JOIN) or at the query level?
For index hints, we could start with just one extension method UseIndex(object index) which is also applied to the entity. I think the index would need to be codified in the model somehow so we can reference it without having to pass its name as a string (to avoid injection attacks).
I think those two ideas could be safely defined in EntityFrameworkCore.Relational. Providers could ignore hints that don't apply to the specific engine (like NoLock on Oracle) or throw a runtime error if so configured.
From a cursory look query hints seem sufficiently different across the different databases, that it doesn't necessarily make sense to include anything specific in relational. EF Core should definitely provide the infrastructure/plumbing to allow arbitrary hints to be defined and flowed to the provider, but I really see little value in defining a locking enum, which would be irrelevant for some (most?) databases, and not contain the right range (or terminology) of options for others...
There just seems to be too much diversity here, and there's enough infrastructure work needed in any case just to make hints possible (plus some discussions, such as somehow applying hints to individual components of the query rather than globally).
@roji Looking at this I found it very pertinent! https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion
@ralmsdeveloper to be clear, I'm very much for query hints and think they're relevant to PostgreSQL, but I'm much less sure about a set of"standard" hints which live in EFCore.Relational and are supposedly valid for all/most databases...
@roji
I agree! If we push it into the Relational library, then it becomes impossible to detect at compile-time any hints that will break when changing database targets. It should be possible to statically link to hints, and guarantee that if you include a Hints library for your provider, you get relative safety that those hints are known to the provider (the provider may fail to generate valid SQL, but at least "blatantly invalid" won't happen.)
This also addresses the fact that Oracle puts hints in "hint comments", whereas Microsoft extends the T-SQL grammar. If you move from Oracle to Microsoft via raw SQL, then your code will "just work". However, if you move from Oracle to Microsoft via EFCore, then your code might not work.
One common ground here is to perhaps do the following:
@ralmsdeveloper This is off-topic, but basically, most databases existing in the modern world started off as research projects of a Stanford guy named Michael Stonebraker (look him up). Including PostgreSQL and Ingres - and Ingres co-developers later formed Sybase which eventually sold a co-license of the source code to Microsoft and eventually became Microsoft SQL Server 6.5 when Microsoft bought it out and then forked it.
Most of the concepts in relational databases and their individual philosophies stem from his "family tree" of projects and Ph.D. students. The reason some databases don't have lock hints is because lock hints don't really make a ton of sense to a Cost-Based Optimizer in a database configured for multi-version concurrency control (MVCC). MVCC by its very nature is intended to be a lock-free design, so hinting an UPDLOCK or NOLOCK is silly to a system that doesn't really "know" about locking. PostgreSQL in general does not have the contention issues Microsoft SQL Server has; SQL Server tends to "lock up" with a sneeze, but thankfully has a ton of tools for DBAs to reduce locking at the expense of maintainability. Yet, Microsoft SQL Server has a pretty amazing Transactional DDL that is a gift from the Gods, and the ability to control locking on a per-table level combined with Transactional DDL makes upgrading systems under high loads not so painful.
If you want to learn more and talk further, ping me offline - my first and last name at gmail dot com. I love this academic stuff and am a bit of a computer science historian.
From a cursory look query hints seem sufficiently different across the different databases, that it doesn't necessarily make sense to include anything specific in relational.
@roji That was my conclusion as well. I the enumeration idea was a thought exercise (and I almost didn't include ShareLock). Obviously, pushing all this to the provider level means using a hint of any sort causes the query to be statically bound to a specific provider. I wasn't sure how palatable that would be to the EF Core community.
I agree! If we push it into the Relational library, then it becomes impossible to detect at compile-time any hints that will break when changing database targets.
I don't think that is an issue and we need add all those behavior (Fail/Log/None etc.). At present, if you specify raw SQL in FromSql
and then change database provider then we don't do anything different. If the SQL does not work in the second database, it fails. While it is possible to use different database providers with EF Core, we don't provide fully database agnostic model. (FromSql for example). Further, hints may have option to ignore/fail but FromSql
got no choice like that, and has to fail. If we already have one blocking behavior then there is no need to work differently for other API. In the essence, if you are using "provider" specific things in your application, then you cannot change provider and expect everything to work.
It's going to take >1 week just to review this thread :).
True. :trollface:
@smitpatel I don't really agree with your strawman argument. That's like saying because the CLR allows unsafe IO then we shouldn't allow safe IO. We have safety when we can have it - and "un-safety" when we can't reasonably expect it.
In the essence, if you are using "provider" specific things in your application, then you cannot change provider and expect everything to work.
I agree here. What I'm suggesting is that if Providers want (and believe Provider writers should be encouraged to write it the way @erikj999 suggested) then they can write it as extension methods and therefore assist people when they want to write portable code.
Look, if you don't do this I'll just have a class with a bunch of static readonly strings that surface such an API. And that's kind of ugly, because then I have to chase down people who wrote a one off string and drill it in people's heads to use the static readonly strings.
There's a broad continuum of ok to great solutions here.
As already quoted by @erikj999 I have also been analyzing and researching something universal for the core "EFCore.Relational", I came to the conclusion that this is practically impossible, hints are very peculiar to each database.
So I believe that each provider must write its own extensions to meet the specific needs of users, in SQL Server I use NOLOCK and UPDLOCK a lot.
I've been coding something for my provider (this may not be the desired end result).
@smitpatel do you have any suggestions on what I did? (Please do not consider my tests 🙈, I'm working to leave 100% aligned with EF Core) https://github.com/ralmsdeveloper/EntityFrameworkCore.FirebirdSQL/commit/db46134f8e947f09736abd84e2a4b74ad2ac1249
@roji do you have any regrets too.
@roji you also have more thoughts on this?
When operations are not working for most cases, then there is no point of providing safety for subset of them. It is pit of failure and confuses users more than helping out. Some magic is fine, but a lot of magic and you are left with userbase who cannot figure out what works and what does not.
@smitpatel I think you need specific examples. You're just generalizing.
I gave a specific use case I like, and even explained how I might code around it if there is no static typing:
public static class QueryHints
{
public static string WithNoLock
{
get { return "WITH (NOLOCK)"; }
}
}
Now, see how @ralmsdeveloper implemented it with a method.
I guess your opposition would be some syntax like this:
var foo = (from p in dbContext.Set<Person>().AddAnnotation(QueryHints.WithNoLock)).ToList()
is somehow superior to this:
var foo = (from p in dbContext.Set<Person>().ToFbHint().WithNoLock()).ToList()
and that the "somehow" rationale is that the core API is in Relational.
Note: It is pretty unfair to call this "magic". Let's not romanticize a purely objective measure like type safety and Open-Closed Principle.
On subjective measures, rather than assume what users find confusing, please explain how it could possibly be confusing to have the following user story:
I want to switch from Firebird to MSSQL in a large code base and have Visual Studio underline in red ink the breaking API changes for hints.
@jzabroski
I want to switch from Firebird to MSSQL in a large code base and have Visual Studio underline in red ink the breaking API changes for hints.
This is the downside to having table/query hints implemented as extension methods via a provider-specific namespace. Attaching a query hint effectively kills the ability for that query to be portable to other providers. Worse, the only way Visual Studio would know what code to red-underline is if the developer replaces "using ...Firebird.QueryHints" with "using ...MSSQL.QueryHints" or some such.
I still think this is the best approach for EF Core, though. Developers shouldn't be required to use "FromSql" for something so essential and ubiquitous as basic lock management. Having said that, other ORMs haven't solved this problem either.
I'm curious what opinion @rowanmiller, @divega, or anyone else has on this. Given there is no way to abstract query hints from database engines, the only way forward is through provider-specific extension methods. But using such an extension means in a query means execution is limited to a specific provider. Is that tenable for EF Core, which in other respects is provider agnostic?
I guess another way forward is with an extensibility scheme like the EF 6 wrapping provider which at least lets developers build in their own query extensions. But it isn't easy.
@erikj999 just to mention that this is the pattern already used for database-specific customization in the model/migratons: you use a provider-specific extensions method on on the different model builders (e.g. ForSqlServerIsMemoryOptimized()
).
@jzabroski yes, if you switch providers you get red squigglies, but switching databases is a relatively rare and huge change to your program. Think about differences in type mapping, in client/server evaluation changes (i.e. what used to be server-evaluated is now suddenly client-evaluated), and all the other subtle difference between databases. Honestly, a red squigglie (which you can easily remove) is going to be the least of your concerns...
@roji I thought about all that, and I think I still want as much help as possible. Thinking about differences in type mapping, client/server evaluation, etc. are outside the scope of this issue. User stories are not intended to solve every single problem a developer encounters. User stories can take on many forms, but one of the most beneficial is the User-Action-Benefit triple. The "As a, I want, so that" user story template is pretty common, because it helps people identify with strangers. Identifying with strangers is one of the things great API designers like Joshua Bloch say is the hardest part about designing APIs. I won't drag on about this, though. If you have any further comments let's go offline or use gitter.im to discuss.
@jzabroski it's an interesting and healthy discussion, I don't really see a reason to take it offline...
My point comparing to other areas where provider-specific instructions are implemented as strongly-typed extension methods, was that a product-level decision has already been made: if you want something specific to SQL Server, you use a strongly-typed method from the SQL Server provider assembly. This doesn't mean we can't be flexible, but if we start introducing provider-specific query hints via weakly-typed strings (or some other mechanism) we provide a very inconsistent API experience to users. IMHO one has to look at the larger context of how the product already looks like.
Your argument against this way of doing things is that migrating between databases creates red squigglies. In my opinion that is actually a benefit. It seems quite dangerous for query hints to silently stop having any effect, which is what would happen if they're weakly-typed: your program may indeed depend on behavior set up by some weakly-typed hint, that doesn't exist or is spelled differently in your new provider. It seems like a much better user story to have the compiler guide them through the places where they're using provider-specific code (e.g. hints), encouraging them to think about what the migration means in each context.
Also, since your argument concerns changing databases, it seems very relevant to see other issues that happen as part of that process (type mapping, client/server evaluation...). Since the red squigglies here are very likely to be a small hurdle compared to the others, and since it's easy to deal with - even if tedious - simply because it's a compile-time error (rather than a subtle runtime error), it just doesn't doesn't seem to be worth losing the benefits of strongly-typed hints.
Finally, I'm not sure in which direction your user story description is supposed to guide us :) I spend a lot of my time trying to listen and think about which APIs users want (but also which APIs are good for them despite what they say they want). My opinion is just different from yours...
@roji uh... re-read the thread. I am pro red squiggles. I described my user story of wanting red squiggles. Smit seemed to be against it.
But, good to know you're on my side.
(At this point, I'm scratching my head wondering if I had a severe typo or poor explanation on the order of "A panda, eats, shoots, and leaves.")
On Wed, Aug 22, 2018, 5:19 PM Shay Rojansky notifications@github.com wrote:
@jzabroski https://github.com/jzabroski it's an interesting and healthy discussion, I don't really see a reason to take it offline...
My point comparing to other areas where provider-specific instructions are implemented as strongly-typed extension methods, was that a product-level decision has already been made: if you want something specific to SQL Server, you use a strongly-typed method from the SQL Server provider assembly. This doesn't mean we can't be flexible, but if we start introducing provider-specific query hints via weakly-typed strings (or some other mechanism) we provide a very inconsistent API experience to users. IMHO one has to look at the larger context of how the product already looks like.
Your argument against this way of doing things is that migrating between databases creates red squigglies. In my opinion that is actually a benefit. It seems quite dangerous for query hints to silently stop having any effect, which is what would happen if they're weakly-typed: your program may indeed depend on behavior set up by some weakly-typed hint, that doesn't exist or is spelled differently in your new provider. It seems like a much better user story to have the compiler guide them through the places where they're using provider-specific code (e.g. hints), encouraging them to think about what the migration means in each context.
Also, since your argument concerns changing databases, it seems very relevant to see other issues that happen as part of that process (type mapping, client/server evaluation...). Since the red squigglies here are very likely to be a small hurdle compared to the others, and since it's easy to deal with - even if tedious - simply because it's a compile-time error (rather than a subtle runtime error), it just doesn't doesn't seem to be worth losing the benefits of strongly-typed hints.
Finally, I'm not sure in which direction your user story description is supposed to guide us :) I spend a lot of my time trying to listen and think about which APIs users want (but also which APIs are good for them despite what they say they want). My opinion is just different from yours...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-415183751, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_QXDBZxZgCtA5YysDlGcLctACGYrks5uTcrFgaJpZM4KRSfR .
An example from https://github.com/aspnet/EntityFramework/issues/6649 is being able to append
OPTION (MAXRECURSION 2000)
to the end of a query.