Closed awr closed 8 months ago
Anything I can do to help this along? I'd be happy to put together a PR if you point me in the general direction of where you'd want to see this (assuming it fits with your direction for the project).
Hi @awr your issue was assigned to me as investigation in 2.2 with the idea that I would do some thinking to try to decide if this was a duplicate of an existing issue or if there was something in your proposal that we wanted to pursue.
I did a search and I actually couldn't find an active issue in our backlog that covers using TVPs in this way.
We have certainly done some thinking about using TVPs as an implementation detail for things like Enumerable.Contains()
. For example, a query like this (based on your examples) would use TVPs automatically:
public static async Task<List<Person>> GetWithTableParameterAsync(
IEnumerable<long> ids,
CancellationToken cancellationToken)
{
using (var db = new MyDbContext()) {
var query =
from person in db.People
where ids.Contains(person.PersonId)
select person;
return await query.ToListAsync(cancellationToken);
}
}
However, there some challenges with doing this automatically. One of them is the fact that SQL Server requires table types to be declared in the database before you can use them in a TVP. Hence we have come up with other alternative implementations that presumably would have lower impact, like #12777.
But I believe what you are proposing is really interesting. On one hand it could be used to solve the same scenario above, but instead of Enumerable.Contains()
, you would explicitly join the two query roots like in your sample code. On the other hand, it seems to be a more general feature that could have other applications.
I think I have been able to come up with two orthogonal new capabilities that we could add which I think would generalize the idea even more:
New SQL Server specific extension method for EntityTypeBuilder and QueryTypeBuilder to indicate you want the corresponding table type to be created. The name could be something like ForSqlServerDeclareTableType()
. I like the option of adding this for entity types and query types, which we already support, more than adding yet a third way of declaring a "shape" in the model.
A new FromData()
core query operator: This could be very similar to FromSql
in the sense that it would be used to override the data source of the query root in that particular query, but instead of specifying that you need to execute some SQL, you would provide the data in-line or though a variable. This method would normally evaluate in memory, but in the particular case of SQL Server , it could cause the creation of a TVP and transference of the data to the server to be processed there. We would need to decide if that behavior would be triggered implicitly, whether it would depend on other parts of the query being evaluated on the server, or if it needs to be required explicitly.
FromData()
could support inputs similar to the existing HasData()
method used for data seeding in model definition, and it could support streaming up TVPs if the input is an IEnumerable<T>
or IAsynncEnumerable<T>
. In case you pass actual entity instances to
Note that FromData()
, we would need consider if the right behavior of the method would be to clone or actually use the same instances.ToQuery()
(the API for defining query) can already be used in the model to supply in-memory data.
Assuming we had these capabilities, your example could look something like this:
public class MyDbContext : DbContext
{
// ... or however it makes sense to define a udt
public DbQuery<LongRow> LongRows { get; set; }
override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Query<LongRow>().ForSqlSserverDeclareTableType();
}
}
public class LongRow
{
public long Id { get; set; }
}
public static class Execution
{
public static async Task<List<Person>> GetWithTableParameterAsync(
IEnumerable<long> ids,
CancellationToken cancellationToken)
{
using (var db = new MyDbContext()) {
var query = from person in db.People
join id in db.LongRows.FromData(ids => ids.Select(id => new LongRow {Id = id})))
on person.PersonId equals id.Id
select person;
return await query.ToListAsync(cancellationToken);
}
}
}
Of course we could also have sugar to make this more terse, like an attribute on the type or DbSet or DbQuery property to automatically create the table type, and a version of FromData()
that takes a collection of a single scalar and converts.
Note for triage: I think this belongs in the backlog for now, but since @awr is interested in contributing we can discuss the design in general terms.
Updated my previous comment to reflect that client evaluation isn't an option anymore after 3.0.
I think the syntax should be more close to that of Set
something along the lines of
IQueryable<TEntity> TableValueParameterSet<TEntityId>(IEnumerable<TEntityId> ids) where TEntityId : struct
By conventon EF could use a table value paramater type named $"ef_core_{nameof(TEntityId)}_tvp"
or similar
If EF core eventually support insert / delete / update directly on IQueryables you could even do
ctx.TableValueParameterSet(ids)
.Join(ctx.Set<MyEntity>(), id => id, e => e.Id, (id, e) => e)
.Delete();
If we extend the table value parameter mechanics to support custom entity types not just simple value types we could even bulk insert with a crazy high level of performance. Something like.
var batch = Enumerable.Range(0, 2000).Select(i => new MyEntity{ Foo = $"Hello World_{i}" }).ToList();
ctx.TableValueParameterSet(batch)
.Insert();
Possibilities are endless :D
This issue need more love from the ef core team.
Guys very nice Feature request by @awr and very nice implementation suggestion by @divega !
But is there any chance we can see this functionality in the EF core ?
Is there any other approach to build ONE EF LINQ query that is using/filtering on a TVP (e.g. list of Ids) ?
Rather than using a temp table or a TVP, couldn't this translate to joining either a CTE or a VALUES list? for example:
SELECT a.*
FROM People AS a
INNER JOIN (VALUES (1), (2), (3), (4)) AS vals_0(Id)
ON a.PersonID = vals_0.Id
;
or:
WITH cte_0(Id) AS (VALUES (1), (2), (3), (4))
SELECT a.*
FROM People AS a
INNER JOIN cte_0
ON a.PersonID = cte_0.Id
;
My understanding is also that MS SQL's TVP implementation is a bit more specific to them, while WITH and VALUES are more generally supported (so hopefully more work could be re-used across providers)
@hauntingEcho both of the above suggestions involve embedding constant values in SQL, meaning that you get different SQLs for different values. That causes query plan fragmentation, which is bad for performance, and the thing that the OP tried to avoid (see above).
Note that the queries above are mainly about selecting rows with IDs in a given list. For this case specifically (which is usually express via LINQ Contains and translated to SQL IN), see https://github.com/dotnet/efcore/issues/13617 which discusses some optimization techniques which are cross-database.
@roji that makes sense. I may have misread this ticket as being primarily about server-side joins on input data (with a bonus of good query plans from TVP) rather than primarily about the query plan on single-column raw data filters. My queries had only used a single column to simplify the examples.
Just stumbled upon this issue. I would love to have a robust, built-in solution for this problem. However, until then, I also wanted to share an EF Core extension I found that accomplishes a similar task: EntityFramework.MemoryJoin (for anyone else who needs an immediate solution).
You basically just add a dedicated DBSet
to your DbContext
with a generic parameter of whatever DTO you want to use for your TVP so-to-speak. If it's protected
, then EF Core won't try to generate an entire table for it in the database, but you'll still be able to incorporate instances of it into queries.
By default, they have a QueryModelClass
type with 3 properties each of types string
, bool
, long
, double
, Guid
, and DateTime
. Should accommodate most use cases of joining on a single table's columns. However, if you need, say, 4 strings, you can create and provide your own DTO class to use instead.
I haven't successfully run a query that supplies multiple lists in a single request so not sure if that's supported or if I'm just doing things incorrectly. At least gives a general idea of how something like this could be implemented though.
Just stumbled upon this issue. I would love to have a robust, built-in solution for this problem. However, until then, I also wanted to share an EF Core extension I found that accomplishes a similar task: EntityFramework.MemoryJoin (for anyone else who needs an immediate solution).
You basically just add a dedicated
DBSet
to yourDbContext
with a generic parameter of whatever DTO you want to use for your TVP so-to-speak. If it'sprotected
, then EF Core won't try to generate an entire table for it in the database, but you'll still be able to incorporate instances of it into queries.By default, they have a
QueryModelClass
type with 3 properties each of typesstring
,bool
,long
,double
,Guid
, andDateTime
. Should accommodate most use cases of joining on a single table's columns. However, if you need, say, 4 strings, you can create and provide your own DTO class to use instead.I haven't successfully run a query that supplies multiple lists in a single request so not sure if that's supported or if I'm just doing things incorrectly. At least gives a general idea of how something like this could be implemented though.
sadly it does not seems to make it easy to add new type mapping like Nodatime or Postgres Enums. Additionally it make all variables use a different name even when the value is unique which result in the query plan being different each times.
Would also love to see a robust solution for this problem
@erwan-joly how do you see this as relevant for NodaTime or PG enum support? Both these are already natively supported by the PostgreSQL provider etc.
@erwan-joly how do you see this as relevant for NodaTime or PG enum support? Both these are already natively supported by the PostgreSQL provider etc.
I think I was unclear, what I meant was the other way around: it would be nice if the proposed solution was supporting those (and any type supported by the provider) not that the support of those types would change in anyway with such a solution.
The suggested solution EntityFramework.MemoryJoin does not work with those types and also create a new set of parameters at every run so it will always get a new query plan.
It only support a small subset of types: https://github.com/neisbut/EntityFramework.MemoryJoin/blob/master/src/EntityFramework.MemoryJoin/Internal/MappingHelper.cs#L321
it would be nice to be able to join with an in memory list for any supported type in the provider. And I believe having this in EF would result in each provider being able to add the support on their side with things like
‘select * from table t join values(generatedFromMemoryList) as list(id, name,…) On t.id = list.id and t.name = list.name’
Everyone, EF 8.0 is introducing full support for primitive collections, which very much addresses what has been discussed above (see this blog post). Rather than using a SQL Server TVP (Table-Valued Parameter), we've gone with JSON arrays as the way to represent arrays.
For example, consider the following simple LINQ query using Contains:
var names = new[] { "Blog1", "Blog2" };
var blogs = await context.Blogs
.Where(b => names.Contains(b.Name))
.ToArrayAsync();
EF Core 7.0 translated this as follows:
SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (N'Blog1', N'Blog2')
The new 8.0 translation:
Executed DbCommand (49ms) [Parameters=[@__names_0='["Blog1","Blog2"]' (Size = 4000)], CommandType='Text', CommandTimeout='30']
SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (
SELECT [n].[value]
FROM OPENJSON(@__names_0) WITH ([value] nvarchar(max) '$') AS [n]
)
This allow the same query plan to be reused regardless of parameters. And in addition, any LINQ operators can now be composed over the array parameter, not just Contains.
So unless I'm mistaken, there's no longer a reason for this issue to be open; unless someone can come up with a specific reason why TVPs should be used (as opposed to a JSON array), EF 8.0 seems to already resolve the issues described here.
Everyone, EF 8.0 is introducing full support for primitive collections, which very much addresses what has been discussed above (see this blog post). Rather than using a SQL Server TVP (Table-Valued Parameter), we've gone with JSON arrays as the way to represent arrays.
For example, consider the following simple LINQ query using Contains:
var names = new[] { "Blog1", "Blog2" }; var blogs = await context.Blogs .Where(b => names.Contains(b.Name)) .ToArrayAsync();
EF Core 7.0 translated this as follows:
SELECT [b].[Id], [b].[Name] FROM [Blogs] AS [b] WHERE [b].[Name] IN (N'Blog1', N'Blog2')
The new 8.0 translation:
Executed DbCommand (49ms) [Parameters=[@__names_0='["Blog1","Blog2"]' (Size = 4000)], CommandType='Text', CommandTimeout='30'] SELECT [b].[Id], [b].[Name] FROM [Blogs] AS [b] WHERE [b].[Name] IN ( SELECT [n].[value] FROM OPENJSON(@__names_0) WITH ([value] nvarchar(max) '$') AS [n] )
This allow the same query plan to be reused regardless of parameters. And in addition, any LINQ operators can now be composed over the array parameter, not just Contains.
- Are widely supported across different databases. TVPs are a SQL Server-only feature.
- Do not require a type to be defined before being used in a query.
- Can also be used as columns and not just as parameters. This allows storing e.g. an array of ints as a column in the database, and then using LINQ Contains() over that exactly as one can use it over a parameter.
- Seem to provide at least better performance compared to TVPs (see this comment for the benchmarks).
So unless I'm mistaken, there's no longer a reason for this issue to be open; unless someone can come up with a specific reason why TVPs should be used (as opposed to a JSON array), EF 8.0 seems to already resolve the issues described here.
Seems perfect 🎉
@roji what am I missing ? This fail I was expecting it to join on an OpenJson list
(tried with compatibility level 130 too) My main usage is with npgsql but as it was not working there I tried with mssql and didn't get more lucky
I would expect the translation in sql to be something like that
SELECT *
FROM TestEntities
JOIN OPENJSON('[{"MyClassId": 1, "DateTime": "0001-01-01"}, {"MyClassId": 2, "DateTime": "0001-01-01"}]') WITH (
MyClassId bigint,
DateTime datetime2
) AS c
ON TestEntities.Id = c.MyClassId;
@erwan-joly can you please open a new issue with a full, runnable repro (not a code screenshot)? This issue isn't the place to discuss that.
@erwan-joly can you please open a new issue with a full, runnable repro (not a code screenshot)? This issue isn't the place to discuss that.
Yeah wasn’t sure if this was a bug or just a misunderstanding on what the issue was about
Use Case: Ability to use query against a variable number of inputs, for a batching scenario. Ideally I'd like to have the same query plan used whether there are 2 input values or 100 input values, in conjunction with a linq query.
Hypothetical linq:
Ideally this generates t-sql that looks something like:
Note that I realize I can already do something similar with a
Contains
-- something like:The difference is in the sql that gets generated, since it embeds the id values into the query (or if using an expression tree walker, I believe it's possible to change this to have n equality checks with sql parameters). I'd prefer a single query plan to either of these solutions.
I also realize that it's possible to use table parameters with raw sql, but the challenge there is that I can't easily inject it into the middle of a complex query if I want to use LINQ.