Open rowanmiller opened 8 years ago
@jzabroski apologies, my bad... My head must have gotten turned around in this long thread.
Rereading and summarizing, there seems to be an agreement that query hints should be expressed via strongly-typed extension method calls which come from provider assemblies (@smitpatel seems to agree, https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-397738849).
I think there's also a consensus that there's no set of standard query hints that would justify any sort of cross-database extension methods living in EFCore.Relational.
The outstanding point of disagreement/misunderstood seems to be your suggestion in https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-411443194 to have an ExplicitQueryHintInvalidBehavior
enum which controls behavior in case hint is unknown to a provider. I indeed have trouble understanding this:
I'm guessing I might be misunderstanding your suggestion though. Once again, ignoring is what currently happens when a database-specific model extension (e.g. ForSqlServerIsMemoryOptimized()
) is used on a different database. That seems like the right thing to do.
Let me know if I'm somehow mis-summarized things.
Oh, wow, you're right. I see the confusion is all my fault., and completely tied to my enum idea.
Here is my thought process on this enum:
1) Above, I listed query hints for many different providers, and concluded that for Oracle, since their hints are comment-based, if EF had an AnnotationTranslators namespace similar to Microsoft.EntityFrameworkCore.Query.ExpressionTranslators, then that could be used to provide a generic core for EF to translate expressions. Then the concrete EF Query Provider could pattern match on the annotation to see if the contents were a hint or not. However, if you're migrating from Oracle to MSSQL, an MSSQL provider might see an annotation as /*+ as the start of a basic comment and just inject comment text into the query. However, if instead of just AnnotationTranslators there was HintTranslators, MSSQL provider could omit a warning or throw an exception when its pattern matcher detects it couldn't recognize the hint. 2) Besides just Oracle->MSSQL, I was thinking of cases like "SQL Server <2016"->"SQL Server >2016 SP1", where 2016 SP1 added the new USE HINT construct. My user story is as follows, and based on my past life working for erecruit:
As an ERP developer at Well-Tempered Software Company who has customers who self-host my software, and are on different versions of SQL Server, I want to maintain a single source line of code and still be able to hotfix urgent issues for customers whose databases don't understand newer hints. And user story 2: I want my customers to have painless upgrades.
To be clear, I personally can fulfill this user story through erecuit.Expr and calling Expand() and having Expand dig out query annotations at run-time that don't apply to the customer's database. The EF Query Provider would never even know I performed magic. However, I think this is something worth considering making internal to the code. Even last night I was watching Philip Carter's "Why you should use F#", and he talked about the Microsoft Notes and Tasks team needing to write code that solves syncing data across Exchange 2008, Outlook Mobile, etc. through current o365 releases. Which reminded me: The world is full of Speedy developers and a few Mr. Rogers.
Above, I listed query hints for many different providers, and concluded that for Oracle, since their hints are comment-based, if EF had an AnnotationTranslators namespace similar to Microsoft.EntityFrameworkCore.Query.ExpressionTranslators, then that could be used to provide a generic core for EF to translate expressions. Then the concrete EF Query Provider could pattern match on the annotation to see if the contents were a hint or not. However, if you're migrating from Oracle to MSSQL, an MSSQL provider might see an annotation as /*+ as the start of a basic comment and just inject comment text into the query. However, if instead of just AnnotationTranslators there was HintTranslators, MSSQL provider could omit a warning or throw an exception when its pattern matcher detects it couldn't recognize the hint.
Unless I'm mistaken, I think we're in agreement that however we represent hints in EF Core, they cannot simply be strings that the user specifies and the provider tacks on in some place in the SQL (this would create SQL injection vulnerabilities etc.). In other words, a hint annotation is something "abstract" (even if it's provider-specific), which the provider picks up and then renders into SQL in some way (which is irrelevant - comment, syntax extension, whatever). So I don't think anyone is suggesting a model where MSSQL sees some annotation from Oracle which then gets somehow injected as a comment into MSSQL T-SQL and silently ignored... Just like provider-specific model extensions, hints would be (structured) annotations which get interpreted and rendered into SQL, rather than SQL fragments that gets integrated somewhere in your SQL query.
However, if instead of just AnnotationTranslators there was HintTranslators, MSSQL provider could omit a warning or throw an exception when its pattern matcher detects it couldn't recognize the hint.
I think you're still disregarding the multiple database scenario. It's completely fine to use both Oracle and MSSQL in the same application, meaning hints from both databases may potentially exist on the same query. I think this is something we want to enable/allow, so we don't want providers to warn or throw when they see unrecognized hint annotations. We just want them to be ignored.
To be clear, you still get red squigglies when migrating from one database to another, because you've dropped the reference to the old provider's assembly, and the strongly-typed extension methods which produce the annotations no longer exist. The database migrations scenario is very different from the scenario where multiple databases are supported at the same time.
Hope this is clear, maybe we're still just agreeing with one another...
I don't think you addressed MSSQL2017 -> MSSQL2016 hotfix path.
I would like to ask about the possibility of at least providing a specific mechanism for UPDLOCK. I'm not sure how that would translate to DB engines other than SQL Server and would probably not make sense (?) for non-relational db types but it would be very helpful. Syntactically it could be used something like ctx.People.ForUpdate().Where(...) and ctx.People.ForUpdate().FirstOrDefaultAsync(...).
In serious scenarios the usage of transactions is critical and in many cases this translates to either SNAPSHOT or SERIALIZABLE.
A typical scenario is reading an object, updating it in memory based on business rules, and saving the changes. With SERIALIZABLE, if two transactions execute concurrently, the read will succeed for both, but when one tries to execute Save Changes the transactions would deadlock. Of course, that is a transient condition which could be retried but if we could explicitly indicate our intention to update the object then, rather than deadlocking, one of the concurrent transactions would wait its turn to read the object. I find this preferable.
Does this make sense? Should I post this as a separate issue? Thank you!
@apacurariu this seems like exactly one of the scenarios enabled by query hints. This issue is currently in the backlog, so it will be considered for a future release.
@roji Thank you for getting back! It's great news that this will be considered.
public void UpdateThing(Thing clientThing) {
var foundThing = context.things
.Where(thing =>thing.id == clientThing.id
&& thing.lastUpdateCounter == clientThing.lastUpdateCounter)
.With(ExclusiveLock) //block the 2nd concurrent updater until the first completes
.Singleordefault();
if (foundThing ==null) {
throw new exception("sorry your thing has been updated since you last read it.");
}
foundThing .lastUpdateCounter++;
foundThing .Name = clientThing.Name;
context.Things.Update(foundThing );
context.SaveChanges();
}
I believe could be handled with a Concurrency Token, since EF will supplement the UPDATE with a WHERE condition on the value of the column attributed as Concurrency token, expect it only compares with the values just read, it only works for race conditions, it doesn't handle stale updates. ie: https://docs.microsoft.com/en-us/ef/core/saving/concurrency
Placing an intent lock allows solving both Stale Updates and Concurrent Race Conditions, imho.
@mikerains Update Locks are used to avoid deadlocks. The point of an Update Lock is literally to prevent a shared lock.
Concurrency tokens by comparison do not necessarily stop an update lock from being needed, such as bulk updates to a table with concurrency tokens might need to use an update lock to prevent deadlocks. In this sense, you don't normally see UPDATE LOCK requested as a feature in a SQL Server ORM, because normally it's code written by a SQL Database Engineer, not a C# engineer. If the code is sensitive enough to require an UPDATE LOCK, you probably don't want to leave the SQL to execute to depend on a c# code generator like an ORM.
They're two separate use cases, although it doesn't stop developers from using one when the other should be used.
Yes, I think I meant an Exclusive lock. In any event, a "With Hint" capability would allow control over the locking behavior and implement strategies such as Stale Update prevention in a way that doesn't suffer from Concurrency Updates.
I am going to edit my post to reflect an Exclusive Lock, as my intent was that the attempt to read would be blocked while another thread is updating. I was able to place Intent locks using EF6 some years ago to achieve this.
I read lot of posts here, and the basic problem is "to try adding something specific to providers in an universal way".
The first hasty answer is it is obviously not possible. An Enum never will be portable between providers. What comes in mind is Http's StatusCodes. Its a big list of const integers, because the list can be bigger or even customized (you can return a custom StatusCode if you want it OR you don't know what specific (or custom?) StatusCode comes from a request).
The question is, if you added something provider specific in YOUR CODE, you don't expect it will work when changing providers. The real question is WHEN the code will break. I imagine the 3 basic ways to code break is:
I really don't know the correct answer to this, but I can suggest EFCore developers must build some kind of API to help "Provider Designers" in Compile Time. The "Provider Designers" then ensure in Run Time everything is good and safe (by example, if the HINT constants exist from some list, if they are supported, etc). The "Provider Designers" will be obviously responsible to provide these constants someway, and OBVIOUSLY, if the user changes providers the code will not compile or run, and this is not EFCore developer's fault or provider designer's fault.
Another point to think about the specifics of Hints is some hints cannot be used together (like UPDLOCK and SERIALIZABLE in Sql Server). This logic can be handled only by Provider Developers. I cannot think an easy or simple way to EFCore devs handle this.
PS: One interesting way to handle this, using inheritance: Using SQL Server Query Hints with Entity Framework The EFCore can provide "HintBase", "QueryHint", "TableHint", etc, base classes, and provider devs can then inherit and create specialized classes.
@GitClickOk I can follow your train of thought but this is not how computer scientists solve such problems as:
the basic problem is "to try adding something specific to providers in an universal way".
The concept you are looking for is expression problem. I will rephrase the basic problem ever so slightly so you can see that it matches the definition on Wikipedia:
The problem is "to be able to independently add new providers (data types), and new operations (query hints) in a universal way".
In this sense, most computer scientists would spit out their morning coffee if they saw the solution was to just keep a list of integers in some StatusCodes class. The reason is simple: Doing it this way requires updating (both code editing and recompiling) and redeploying the same assembly/nuget package. In computer science land, that's too easy and also too error prone. As you are aware, it requires making sure everyone coordinates and doesn't use the other programmer's integer codes. This "sort of works" for international standards like HTTP StatusCodes defined by an RFC, as well as your car's ODB status codes, but it doesn't really work for things where people have different release timelines and need to extend functionality at the very moment they need to extend it without having to get approval from some vendor to add their solution to a sacred list of integer codes.
I skimmed through your example article, and the basic problem is that it is using DbCommandInterceptor, which will intercept every query. This is not great for a number of reasons, the most basic being the caller needs to know their DbContext configuration in addition to their local requirements.
@jzabroski I understand you, sorry if I expressed myself bad. I'm not suggesting "keep a list of integers", I was just comparing this list with a list of strings like "NOLOCK" or "FOR UPDATE". I ever said "I really don't know the correct answer to this", I'm just adding topics for brainstorming.
About the article, I think a nice idea the "architecture" used with some base classes and the providers can then inherits to add new "Hint" classes. It does not need specifically use DbCommandInterceptor because we are talking with EFCore devs, they can do the way they want, the article writer is using what is available, but the EFCore devs can use the idea of base classes, inheritance (or even interfaces) and allow Provider devs to extend as will.
Another unrelated idea (I thought while I answered you) can be the way as .Net deals with SqlDbType and DbType. You can read it here. We have DbType (generic) and SqlDbType (SqlServer specific). Each provider can add custom DbTypes.
I'm sorry if I'm not being specific, as I said before, I don't know the solution, I'm just "brainstorming", listing ideas in order to maybe help to develop some solution.
Smit Patel has already explained what the EFCore team seems as the API design challenges: They view EFCore querying as two sets of APIs: a relational-level API that contains basic relational algebra concepts, and provider-level API that contains provider-specific extensions. Smit even covers a third API layer implicitly, which is, how do you put query hints on things that are included implicitly? Quoting him:
Further API needs to consider what would happen for the tables which are indirectly brought in through auto-include of owned entities or nav expansion.
The other practical reason nothing is done here is that there is a clear workaround for this problem: Move logic into stored procedures, or encapsulate tables with views that apply hints internally. There are no workarounds for bugs in the product, however, so the EFCore team needs to prioritize those as well as getting people to move to .NET Core (which means porting EF6 to .NET Core rather than tackle fixing bugs and adding new slick features like this one).
Hope this helps.
@rowanmiller as #12263 said, it's critical in a high-performance environment, but it's been more than one year, there's still no progress. could you speed things up and add this to milestone 3.0?
Is there any consideration for option(optimize for (@p0 = 1))
and hints like it? I have some queries that are suffering from parameter sniffing. It'd be nice to turn it off selectively where it hurts rather than turn it off at the DB level
@mburbea The solution for now is to create a view and map an entity to the view, or move the generated query into a stored procedure after you capture it in profiler, and apply your hint inside the wrapper.
A view doesn't help me here as you can't stick an option clause on a view statement. I can switch to a store procedure route, but that's kind of ugly and removes the whole point of using EF.
I can go down the road of using a diagnostic listener and intercept it that way, but I'd prefer to avoid that mess. It'd be nice to end a query with a .Hint("option optimize for (@p0=1)")
.
@smitpatel you have thought about this?
If I had I would like to work on it.
If you have nothing on, I can play some bits to restart this thread!
@smitpatel do you have thoughts about this? If so, I would like to work on it.
If you have nothing, I can play a few bits to restart this thread!
cc/ @divega
Edited because of the lousy English hehehe!
@ralmsdeveloper as a team, we are really focused on finishing EF Core 3.0. AFAIR, the general design was settled a long time ago and captured in a few of the 69 comments above, but at this point, even if you sent a perfect PR that addresses all the concerns, we would probably have to ask you to wait until we have a chance to review it, most likely in the next release after 3.0. Even trying to separate the noise from valuable insights to summarize what the design should be, has become such a challenge that I have to tell you it is out of scope for this release. Sorry.
@divega We know the commitment that the team has, let's wait for more about it, if there is time left I will create an extension for version 3.0 so those who need this functionality could use it.
@divega Could I try to create a new, clean issue that tries to summarize some of the talking points?
I've started extracting some of my comments into sub-pages here: https://github.com/jzabroski/SqlQueryHintMatrix - my thought would be to continue to improve this repository, have your team review it, then create a new issue from my refactored README.md
I think Smit's thoughts are valuable, especially things like many-to-many tables and future features like not requiring many-to-many intermediary CLR types and how someone would query hint such things.
@jzabroski thanks for collecting that information. It doesn’t help now, but i may help when we revisit this issue in the future.
Hi guys, I see this has been opened for a while. I just raised an issue with ForceIndex for MySQL but I can see this is a more generic thing. Is there a way I can help?
Removing milestone to discuss in triage.
@neoaisac Thanks for your interest. There's a lot of discussion here which I am hoping @divega can distill down into some key points. However, it could be that this ends up being quite complicated to implement and it may end up being something that, realistically, needs to be done by the team.
@neoaisac thanks for your interest. As I mentioned in https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-491473311,
...as a team, we are really focused on finishing EF Core 3.0. AFAIR, the general design was settled a long time ago and captured in a few of the 69 comments above, but at this point, even if you sent a perfect PR that addresses all the concerns, we would probably have to ask you to wait until we have a chance to review it, most likely in the next release after 3.0. Even trying to separate the noise from valuable insights to summarize what the design should be, has become such a challenge that I have to tell you it is out of scope for this release. Sorry.
I understand there has been interest from multiple people in this issue, so I am going to try to describe some of the dimensions we have identified in this discussion that I believe will be relevant for a future design, even if we don't intend to take this for now.
We came to the conclusion some time ago that the feature would be divided in two layers:
A set of high-level and (most likely) strongly-typed APIs that would allow representing the hints that can be associated with queries when targeting a specific type of database. These APIs would most likely be defined as extension methods by providers.
A set of low-level and (most likely) loosely-typed support APIs defined in relational and/or core that would provide a unified mechanism for hints to flow from the high-level APIs to the bottom of the query pipeline, where providers can then process them and generate the corresponding SQL.
I think there is consensus that we don't need to try to identify common hints that could apply across multiple providers. The common code (e.g. the code in relational or core) only provides the transport for the hints and each provider redefines the full set of hints it accepts.
Open issue - Naming: should these extension methods include a prefix in the name to make sure there are no collisions when multiple providers are in scope? E.g. ForSqlServerWithNoLock()
We also came to the conclusion that we should not conflate what are actually separate features. Hints at the whole query level, per subquery or specific to tables may require different treatment both on the user facing high level API and potentially on the low level implementation. Different databases also support different hints at different labels, although some are common.
The API for query hints results in hints that apply to the whole SQL query or to specific subqueries, hence regular extension methods on IQueryable<T> should be sufficient.
Table hints need to be constrained to a specific table participating in a query. When multiple tables participate from the same query, they can carry different hints. Even when the same table participates more than once in the same query (e.g. a self-join) it can carry different hints on each appearance. In general this can translate to having methods available on DbSet<T>, similar to the new FromSqlInterpolated() and FromSqlRaw() APIs (but likely mutually exclusive with the FromSql methods). A general pattern that could work is a single method for DbSet<T> that returns IQueryable<T> and accepts a nested closure so that multiple hints can be associated with a table:
var pendingTransactions = context.Transactions
.WithTableHints(hints => hints.NoLock().ForceScan())
.Where(t => t.State == TransactionState.Pending && t.UserId == id);
but also simple sugar methods that are easier to call when just one hint applies:
var pendingTransactions = context.Transactions
.WithNoLock()
.Where(t => t.State == TransactionState.Pending && t.UserId == id);
In scenarios in which entities and tables don't map 1:1 (e.g. table splitting, entity splitting, TPT, and TPC), there might be some misalignment, and we need to come up with ways to handle that acceptably. For example:
We could allow default table hints to be indicated in the model. In fact, hints can apply not just to SELECT queries but also to UPDATE, INSERT, DELETE operations, so having APIs that enable granular control and affect CUD SQL generation could enable new scenarios.
This happens to be true for query hints as well (i.e. they can apply to UPDATE, INSERT, and DELETE), but it is harder to think how this could be represented.
Low level API in relational or core that allows annotating a query with hints. We already other have things that look a lot like this, like query tags so we may be able to reuse infrastructure. As mentioned above, we could have one mechanism that covers both query hints and table hints, as long as for the latter it is possible to provide the specific table the hint applies to as an argument.
Most used query and table hints for some provider.
Table hints in the model.
@divega one small note... when talking about table vs. query hints, it may be worth thinking about query hints which apply to specific SELECT statements. A query may involve multiple subqueries, and there may be scenarios where specifying different hints for these may be desirable. I'm not sure whether there's enough real-world scenarios to justify this - just mentioning it.
@roji good point. We will need to survey how multiple database engines work to arrive to a conclusion on whether subqueries can carry they own hints. Do you have any cases in mind?
@divega This is an excellent write-up. One point:
In scenarios in which entities and tables don't map 1:1 (e.g. table splitting, entity splitting, TPT, and TPC), there might be some misalignment, and we need to come up with ways to handle that acceptably. For example: [...]
- For cases that require complete control, we can provide an API that looks like a query hint (e.g. can appear anywhere in the query) but accepts the name of the table that the hint is associated with as an argument. In fact, it may very well be the case that this is the basic building block that we provide in the relational layer.
While this seems like the correct catch-all, I think one edge case @smitpatel mentioned is important:
My main observation is that many-to-many is still a work in progress, and I don't want hints to create technical debt that slows progress on many-to-many: I can workaround lack of table hints, I cannot easily workaround exposing many-to-many tables as classes in my domain model.
@divega I guess null ordering (see https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-390110534) could be an example, where you want one null ordering in a subquery and another in the main. A possibly weaker one is could be SELECT ... FOR UPDATE ...
, which locks the selected rows for the duration of the transaction. In theory locking behavior is per SELECT per table (although you can leave out the table to lock rows coming from all tables), but in practice I'm not sure it really matters to have one subquery lock and another not to lock... It might be possible to imagine such a case if I think about it more.
One issue with all this is that in general, subqueries aren't really expressed in the LINQ query - they are formed internally by the query pipeline in order to translate certain operations. So it's not clear how someone would express anything on a subquery... Perhaps a hint method call that holds for all previous LINQ operator calls, up until the last method call.
I had started messing around with this, but the cover was exclusiem to a tableexclusively on a table, I liked the scenario that @divega pointed out about DbSet
@roji All of this just reinforces that we are nowhere near a final design :smile:
Anyway, I think in many cases subqueries are actually present in the LINQ version of the query as well.
Based on what you describe, I am inclined to support query hints anywhere in the query, just like regular LINQ operators. The provider may decide where to generate it. E.g. the Npgsql provider can decide to generate hints for the closest subquery boundary, while the SQL Server provider may always lift the query hint all the way to the whole query to generate OPTIONS correctly.
@jzabroski good point about hints for objects that are hidden like the many-to-many link table. We may need also leverage a low-level API for this.
Interestingly, I was reading that SQL Server (and I believe also Oracle) support specifying some table hints globally (e.g. as query hints) by providing the identity of the table via its _exposed_objectname (essentially the alias used in the query to refer to that table). Not sure it will help though. It seems a design that includes these might already be too far down the path of diminishing returns.
Having an option to tell EFCore which transactions scope is requested, would be very useful.
@sunsnow-tail can you confirm that you mean transaction isolation (e.g. serializable, repeatable read), and not the System.Transactions.TransactionScope class? If so, that is not part of the query itself, but rather set when the transaction gets created. See the docs for controlling the transaction.
@roji transaction isolation it is.
@sunsnow-tail thanks, so yeah, as shown in the docs above, you can pass your desired isolation level to BeginTransaction, e.g.:
using (var transaction = context.Database.BeginTransaction(IsolationLevel.Serializable))
{
// ...
transaction.Commit();
}
@roji EFCore 1.1.2.0 has no overloads for IDbContextTransaction BeginTransaction()
EFCore 2.2.6.0 is the same
The overload that accepts an isolation level is RelationalDatabaseFacadeExtensions.BeginTransaction.
@roji Thank you.
Install-Package : Could not install package 'Microsoft.EntityFrameworkCore.Relational 3.0.0'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.8', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author.
Do I try install the right package? If not: what is the right one?
@sunsnow-tail please don't continue the conversation on this issue (it's unrelated) - open a new issue if you have further issues.
EF Core 3.0 dropped support for .NET Framework (see breaking change note).
please add to milestore 5
I fail to see why this would be part of the transaction mechanics... :(
Just wanted to voice my opinion that something like @divega posted above would be extremely helpful for me. Right now in my current assignment I either have to wrap everything in a transaction or set at the DB level and with the proposed API there it seems like it would be a LOT more natural using LINQ.
Any workaround until major updates?
I´m using ef 3.1, with some SQL background tasks that frequently update my tables, and I need ef queries to not being locked by that.
The database is configured with snapshot.
I appreciate any help or tip.
Thanks
@lixaotec Consider using an IDbCommandInterceptor.
Will give a look @ajcvickers , thank you!
Is there any chance this is going to make it into EF Core 5 so we will be able to start adding with (nolock) on our queries?
@wklingler EF Core 5.0 is shipped. The next release will be EF Core 6.0 in November 2021. We are currently going through the planning process for 6.0 and will consider this item. However, keep in mind that there are many other high priority features with which it will be competing for resources. Make sure to add your vote (👍) above.
That's too bad, thanks for such a quick response!
An example from https://github.com/aspnet/EntityFramework/issues/6649 is being able to append
OPTION (MAXRECURSION 2000)
to the end of a query.