Support expression (AKA functional) indexes

roji commented 2 years ago

Most databases support including arbitrary expressions in indexes, instead of just columns:

CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

This allows filtering by arbitrary expressions to be speeded up at the cost of more expensive updates. This is a migration-only feature, so users can do this relatively easily on their own today, via raw SQL.

Note that a composite index can mix together both regular columns and expressions - this would likely make the modelling API tricky. We could also support only single-expression indexes to start with, this would likely cover most usages.

Cross-database support:

PostgreSQL
MySQL (since 8.0.13)
SQLite
Oracle
MariaDB: apparently no support (kb).
SQL Server: not supported.

Where not supported (SQL Server, MariaDB), it's possible to define an index over a stored generated/computed column to get a similar effect. If we really want to, the provider could automatically set that up under the hood (with a shadow column or property), though that could get a bit tricky.

Previously discussed in #3986, https://github.com/npgsql/efcore.pg/issues/119, #18382.

roji commented 11 months ago

For anybody landing here, while EF doesn't have an API for doing this, it's trivial to do it yourself by adding the desired CREATE INDEX SQL directly into your scaffolded migration (this isn't problematic in any way - editing scaffolded migrations is encouraged where needed). See these docs for more info.

hexxone commented 3 weeks ago

Bump. All the major databases suppoort this, so why is Microsoft shipping this feature still incomplete?

Yes you can manually modify the Migration to accomodate for this problem, but why is it literally impossible to handle that inside of EF-Core ? Just instead of the Propertyname, allow for any arbitrary string ? It cannot be that hard to implement...

ajcvickers commented 3 weeks ago

@hexxone This issue has 16 votes (👍). This puts it in 179th position in the list of most requested features. See the planning process for more information on how we decide what to work on. Make sure to vote for this issue if it is important to you.

macias commented 3 weeks ago

" it's trivial to do it yourself" I hope this feature is not pushed back because there is "trivial" workaround :-) I work with SQLite currently and with its limitations it requires some non so obvious hops to achieve what someone really has in mind.

It would be great if EF Core could accept expression trees when defining indices, not just "flat" objects.

roji commented 2 weeks ago

@macias beyond there being little votes, as @ajcvickers wrote, users can already define expression indices by using SQL in migrations themselves (as with all migration-only DDL features). This does make this lower-priority in my mind, compared to problems where there really is no way to achieve things.

hexxone commented 2 weeks ago

In my case, I also need to create an index on “LOWER(column)”, not just the column itself. This specific requirement isn’t outlandish or a rare use-case. In fact, it’s a fairly common operation in many scenarios where case-insensitive search is required.

As of now, EF Core allows me to describe indices with Collations, ‘USING’ options like [gin, gist, etc.], and operators for trigram indices and full-text search. All these features are incredibly useful and are translated to provider-specific syntax seamlessly.

Yes, it is possible to add this functionality via raw SQL in the migration, but doing so for every such requirement (lots of DTOs) can quickly become tedious, especially in larger projects. Not to mention, it feels odd that this is the only feature that doesn’t get translated to provider-specific syntax.

By integrating this feature, you would be able to offer developers a more streamlined and intuitive experience, reducing their reliance on manual SQL scripting within migrations. And I really think this feature might get more votes if more users were aware of its potential benefits.

roji commented 2 weeks ago

Not to mention, it feels odd that this is the only feature that doesn’t get translated to provider-specific syntax.

Expression indices go beyond just a allowing a collation or index method to be used - these last options are basically just a string that we stick in the CREATE INDEX DDL. Whereas an index can be defined over an arbitrary combination of regular columns and expressions - this makes it more complex to design and implement (and is probably the main reason we haven't tackled it yet).

Not to mention, it feels odd that this is the only feature that doesn’t get translated to provider-specific syntax.

There are definitely many other DDL options that EF doesn't support - especially if you look beyond indexes. It's basically unfeasible for EF to allow you to model all the possible DDL configuration options across all providers; and once again, since it's always possible to just do what you want in SQL, the value is generally lower than other features where there's no other option.

cervengoc commented 2 weeks ago

Hi @roji,

One big downside of using raw SQL migration is that I cannot define all of our indices in a single place in the codebase, like being part of our model. I think this is a serious one, otherwise it's very hard to track what we are indexing or not, etc. Also, if we have any option to define them centrally, we can do any kinds of custom abstractions, like generating indices based on some conventions, or so.

I'm definitely not seeing the whole picture, but ideally I could imagine having something like an "SQL-based" index in the model, where one can define a name, and the entire SQL DDL statement for the index. Then these can be included in the model, compared and translated during migration creation. This would be kind of a small, but still very useful abstraction. However, I cannot see whether the current indexing api could be extended with these, or a new SqlIndex (available only via fluent API) would be better.

Let me know what you think.

roji commented 2 weeks ago

I'm definitely not seeing the whole picture, but ideally I could imagine having something like an "SQL-based" index in the model, where one can define a name, and the entire SQL DDL statement for the index.

Allowing the user to specify the entire CREATE INDEX DDL (i.e. CREATE INDEX xxx ....) doesn't work, since it doesn't allow altering the index later; the key point to understand here is that the model describes database state ("there's an index with these characteristics"), as opposed to migration operations (create, alter, drop) which represent operations performed at a specific point in time to change the database from one state to another. This is why when something is represented in the model, it must represent the characteristics of the object (e.g. what columns and/or expressions the index covers), and it's EF's responsibility to generate the correct migration operations to apply that state to the database (e.g. by creating the index).

Aside from that, I get the desire to have everything defined in the model; but as I wrote before, expecting EF to model every possible thing across all databases simply isn't feasible. I do think EF support for expression indexes specifically is important and hope we get around to it, but this feature has generally received very little interest over the years (only 20 votes), so it's unlikely we'll get to it soon.

cervengoc commented 2 weeks ago

Thank you for your time and reply.

I understand your points, and also understand the abstractions and responsibilities of these EFCore and/or provider specific layers. The key point I wanted to ask is can't we have a sweet spot somewhere, where it's still abstract and can be part of the model, but already customizable.

Let me try to elaborate on a possible concept. First I could say that "I have an index (unique or not, etc)". This can be kind of described with the current model. And then instead of specifying the columns, and other properties, I could specify something like an SqlDefinition. That would be an optional customization point of an index, and it would mean the "rest" of the statement like CREATE [UNIQUE] INDEX [SqlDefinition]. Of course if one uses this, then specifying columns or any other properties would be a validation error. Also, altering could be detected by comparing the old and the new Sql definition, and if changes are detected, it could be modeled as a drop and recreate of the index.

Could you please help me to understand, and point out why this does not makes sense?

roji commented 2 weeks ago

Also, altering could be detected by comparing the old and the new Sql definition, and if changes are detected, it could be modeled as a drop and recreate of the index.

Dropping and recreating the index is something that frequently doesn't make sense when lots of data is involved - this can be a very heavy operation, and during the reindexing queries can't use the index as it has been dropped. It's true that many types of index alterations would require a rebuild internally, but that's still no reason for EF to do DROP/CREATE (after all, that's why ALTER INDEX exists in the first place in databases).

Even if this approach would work very specifically for indexes, it certainly wouldn't for other database objects. For example, if someone changes some facet on a table, it is absolutely not possible to drop and create the table (losing all data). And the problem here definitely isn't specific to indexes in any way, but is rather a general one.

hexxone commented 2 weeks ago

Thank you for your engagement with the topic roji.

In the meantime I had some time to think about it as well. For now, my solution is having a [CustomIndexAttribute] which can additionally take an optional "expressionFunction" string. When the string is not null or empty, I simply wrap the property accessor with it and manually create my Index like so:

Example Code

```c-sharp public static void CreateCustomIndicesOnStartup(this DbContext dbContext) { foreach (var entityType in dbContext.Model.GetEntityTypes()) { var schema = entityType.GetSchema(); if (!string.IsNullOrEmpty(schema)) { schema = $@"""{schema}""."; } var tableName = entityType.GetTableName(); foreach (var property in entityType.GetProperties()) { var propertyInfo = entityType.ClrType.GetProperty(property.Name); var customIndices = propertyInfo?.GetCustomAttributes().ToArray() ?? []; foreach (var customIndex in customIndices) { var unique = customIndex.Unique ? "UNIQUE" : ""; var indexName = customIndex.Name ?? customIndex.AutoGenerateName(property); var collate = customIndex.Collation != null ? $"COLLATE {customIndex.Collation}" : ""; var propExpression = property.Name; if (customIndex.ExpressionFunction != null) { propExpression = $"{customIndex.ExpressionFunction}(\"{propExpression}\")"; } var usingMethod = ""; if (customIndex.UsingMethod != PgIndexType.Default) { usingMethod = $"USING {customIndex.UsingMethod}"; } dbContext.Database.ExecuteSqlRaw($@" CREATE {unique} INDEX IF NOT EXISTS ""{indexName}"" ON {schema}""{tableName}"" {usingMethod} ( {propExpression} {collate} {customIndex.Operator} ) "); } } } } ```

Of course, the problem is Indices will not be easy to modify once created, deleting them every time is inefficient and you have to take care of that by yourself. I am also still unsure If I want to keep going this route.

Another thing which came to mind was "User-defined function mappings" which already exist.

Maybe it would be possible to define these index-expressions in a similar fashion like an Extension method on IndexBuilder which says "HasExpression" and takes a literal "Expression"-Type object describing what to do?

Although I dont know the specific implementation of that part, maybe it would be "easily" reusable for detecing the differences in Indices? E.g.:

modelBuilder.Entity<User>().HasIndex(user => user.Name).HasExpression( args => 
   new SqlFunctionExpression(
         "LOWER",
         args.First(), // member expression ?
         type: args.First().Type,
         typeMapping: args.First().TypeMapping)
)

let me know what you think.

roji commented 2 weeks ago

When the string is not null or empty, I simply wrap the property accessor with it and manually create my Index like so

If you're running CreateCustomIndicesOnStartup() every time your program startups up (as indicated by the name), then it should fail the second time around, since the index is already present in the database. You can drop it beforehand, but dropping and recreating all indexes every time your application starts is a very bad idea.

In addition, you wrote above that the reason for wanting to have the expression indexes in the model is to be able to track them in a single place in the code base; but sprinking attributes with the SQL definitions around your code base doesn't seem to achieve that.

User-defined function mappings don't change anything here; at the end of the day, the index still needs to be created over SQL which contains the function invocation. In other words, it doesn't matter if you put your SQL directly in the index expression definition, or wrap it in a function and invoke it from the index expression definition.

I'd generally advise not trying to hack a feature which doesn't exist, only to avoid creating the expression index via SQL in your migrations - that's the right way to manage that currently. I understand you'd like to have your index definitions in a single place in your model so you can track it better - but that doesn't sound like it merits brittle hacks; you can simply introduce comments into your model configuration that document the expression indexes until this feature is implemented.

bazarniy commented 3 days ago

When the string is not null or empty, I simply wrap the property accessor with it and manually create my Index like so

If you're running CreateCustomIndicesOnStartup() every time your program startups up (as indicated by the name), then it should fail the second time around, since the index is already present in the database. You can drop it beforehand, but dropping and recreating all indexes every time your application starts is a very bad idea.

In addition, you wrote above that the reason for wanting to have the expression indexes in the model is to be able to track them in a single place in the code base; but sprinking attributes with the SQL definitions around your code base doesn't seem to achieve that.

User-defined function mappings don't change anything here; at the end of the day, the index still needs to be created over SQL which contains the function invocation. In other words, it doesn't matter if you put your SQL directly in the index expression definition, or wrap it in a function and invoke it from the index expression definition.

I'd generally advise not trying to hack a feature which doesn't exist, only to avoid creating the expression index via SQL in your migrations - that's the right way to manage that currently. I understand you'd like to have your index definitions in a single place in your model so you can track it better - but that doesn't sound like it merits brittle hacks; you can simply introduce comments into your model configuration that document the expression indexes until this feature is implemented.

Sorry but index inside migration does not work in EnsureCreated method. It commly uses in tests. And this great hack fixes it!

dotnet / efcore

Support expression (AKA functional) indexes #28360