dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.79k stars 3.19k forks source link

SQLite full-text search support #4823

Open natemcmaster opened 8 years ago

natemcmaster commented 8 years ago

SQLite's FTS3 and FTS4 (full-text search) extension modules allow users to create special tables with a built-in full-text index. This extension is usually enabled by default. (I checked: most platforms we support have a version of SQLite compiled with ENABLE_FTS3).

To leverage this, the create table syntax and query syntax are slightly different.

CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT);     /* FTS3 table */
CREATE TABLE enrondata2(content TEXT);                        /* Ordinary table */
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* FTS search -- fast */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* regular search -- slow */

To enable this in EF, we would need to add some kind of configuration option to specific which tables are FTS and make query respond accordingly.

In the meantime, users can work around this with MigrationBuilder.Sql and FromSql.

More docs: https://www.sqlite.org/fts3.html

rowanmiller commented 8 years ago

combined into https://github.com/aspnet/EntityFramework/issues/1590

divega commented 6 years ago

Un-combining.

Trolldemorted commented 6 years ago

You are talking about fts 3 and 4, will you also support 5?

bricelam commented 6 years ago

Now that SQLitePCL.raw supports FTS5, we'd probably just start with support for that.

bricelam commented 6 years ago

Some notes:

bricelam commented 6 years ago

Daydreaming about this some more:

LINQ like the this...

from e in db.Emails
where EF.Functions.Match(e, query)
orderby EF.Functions.Bm25(e)
select new
{
    e.Id,
    Subject = EF.Functions.Highlight(e.Subject, "<b>", "</b>"),
    Body = EF.Functions.Snippet(e.Body, "<b>", "</b>", "...", 64)
};

...would produce SQL like this.

select
    id,
    highlight(email, subject, '<b>', '</b>') as subject,
    snippet(email, body, '<b>', '</b>', '...', 64) as body
from email
where match(email, $query)
order by bm25(email);

Notice how expressions like Highlight(e.Subject, ...) are translated into highlight(email, subject, ...) and you could write either Match(e, ...) (translates to match(email, ...)) or Match(e.Body, ...) (translates to match(body, ...)).

Obviously, Match, Bm25, Highlight, and Snippet cannot be evaluated on the client and should throw.

bricelam commented 6 years ago

Note, some functionality is already possible in EF Core today.

from e in db.Emails
where e.Body == query
orderby e.Rank
select e;
mqudsi commented 5 years ago

@bricelam fts5 has a boatload of alternate/equivalent syntaxes you can use; one of them is to use the name of the table as a scalar function accepting a string. Shouldn’t it be possible to just map it as a scalar db function with that support recently in EF Core oob?

Eg

Select * from Table t 
Join Search
On t.rowid = search.rowid
Where search(@keywords)
Order by rank ASC

In ADO.NET parlance. If you add the virtual Rank column (already doable) and the scalar function is as I suspect supported, then that makes things easier.

Support for NEAR sucks because it uses the ugly quoted string within a quoted string format, but perhaps there’s an alternative syntax there?

bricelam commented 4 years ago

FYI, I finally wrote a blog post about SQLite Full-Text Search and EF Core.

groege commented 3 years ago
order by bm25(email)

would order by bm25 also work for special letters like umlauts: äöü or something like ß?

bricelam commented 3 years ago

@groege It depends on your tokenizer.

Kalle4242 commented 3 years ago

After realizing that all these great features and "daydreams" have not become part of the latest efcore version, is there any announcement when (or if) one can expect this will part of a future version? Is it possible to up vote the features for upcoming versions?

ajcvickers commented 3 years ago

@Kalle4242 Yes: 👍 on the issue. We will be publishing a plan for EF7 soon. See release planning for more info.

Kalle4242 commented 2 years ago

I have a problem calling a select/search statement with FromSqlRaw on a fts5 virtual table. I have two test projects, both using the same model, but I get different results calling the same statement:

        protected override void Up(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.Sql(@"CREATE VIRTUAL TABLE AssetSearch USING fts5(AssetCommonId UNINDEXED, AssetVersionId 
                                                    UNINDEXED, AssetPartId UNINDEXED, Title, Body, Comment, Notice)");
        }

public IQueryable<AssetSearchDbo> Search(string question)
 {
            string query =
                "SELECT " +
                "   \"AssetSearch\".RowId " +
                " , \"AssetSearch\".AssetCommonId " +
                " , \"AssetSearch\".AssetVersionId " +
                " , \"AssetSearch\".AssetPartId " +
                " , snippet( \"AssetSearch\", 3, '>>', '<<', '...', 8 ) As Title " +
                " , snippet( \"AssetSearch\", 4, '>>', '<<', '...', 8 ) As Body " +
                " , snippet( \"AssetSearch\", 5, '>>', '<<', '...', 8 ) As Comment " +
                " , snippet( \"AssetSearch\", 6, '>>', '<<', '...', 8 ) AS Notice " +
                " , bm25(\"AssetSearch\") As Rank " +
                " FROM \"AssetSearch\" " +
               $" WHERE \"AssetSearch\" MATCH ' - {{{{AssetCommonId AssetVersionId AssetPartId}}}} : {question}' " +
                " ORDER BY Rank";
       IQueryable<AssetSearchDbo> searchResults = AssetSearch.FromSqlRaw<AssetSearchDbo>(query);
      return searchResults;
 }

When inspecting the searchResults in one case an enumeration of AssetSearchDbo objects is returned and the results of snippet and rank functions are not computed. The snippet fields contain the entire field values without tagging the "match term" and rank is null. In the context of the second test project an enumeration of Castle.Proxies.AssetSearchDboProxy objects with correct computed snippet and rank functions is returned. Furthermore in the second case the AssetSearch DbSet consist already of those proxy objects.

Can anybody imagine which difference makes it work either this or that way? How can I force the DbContext to use LazyLoading in first case as well (yes it is set in the OptionsBuilder)? Or, the other way round, what is going wrong in the first case, that lazyloading is not used there?

Maybe there is someone knowing more about the combination of efcore, lazy loading and fts5. By the way I'm working with VS2022 and Net6.0. Thanks.

Kalle4242 commented 2 years ago

I'm working on it myself. A difference between the two test project is, that in first on the context is kept open during the entire test. In the second one a new DbContext is created for each test method. I changed the first one in a way it creates a new DbContext before the search method is called. And this worked!!! Finding1: At least one prior action on the DbContext leaves it in a state, that it does not work properly in the described case. Maybe s.o. can guess, what this can be. Finding2: For the moment, there's work-around! Or maybe it is even recommendable to work that way. Maybe s.o. has an advice here.

radoslawkosinski commented 2 years ago

Do you have any sample project on how to use FTS5 on SQLite EF .NET Core? I followed https://www.bricelam.net/2020/08/08/sqlite-fts-and-efcore.html but not sure if I understand correctly, I have few problems: there is FTSPost model which should create "normal" table, however we replace migration of this table with manual migration so the virtual table is created with rowid and Content column: CREATE VIRTUAL TABLE FTSPost USING fts5(Content); Now, the FTSPost table(FTSPost virtual table) doesn't have the properties needed to one to one mapping with Post table(which are defined in FTSPost model): class FTSPost { public int RowId { get; set; } public Post Post { get; set; } public string Content { get; set; } public string Match { get; set; } public double? Rank { get; set; } }

Can you share the migration and the data model created by this?

Kalle4242 commented 2 years ago

Sorry. I didn't manage to run Brice's example. It also didn't match my requirements. The search content of my solution is not in a database table. My database is a metadata database for different assets which our solution handles. The search data is extracted from the content of those assets and only should be separately stored to fts5 tables for full-text-search purpose. It is not required, so it is superfluous in the metadata database. As you can see I'm selecting my search hits via a FromSqlRaw call. This is closer to the example of VahidN https://github.com/VahidN/EFCoreSQLiteFTS But my fts5 virtual table representing object contains a reference to a table of the metadata database. I added it by accident, maybe Brice's example in mind. I did not add it to the modelbuilder in the OnModelCreating method and it workedf or me. Because it's a navigation property you should make it virtual public virtual Post Post {get; set; } Attention: If you add an additional automatic database migration, an index and a foreign key will be added to the up method and deleting them to the down method for this reference. You have to delete this statements. Leaving it in the migration, will overwrite the virual fts5 table with a conrete table with the same name. It took me hours to realize that is looking different in the DB Browser for SqLite, wondering why it is not working anymore. Hope this will help you.

radoslawkosinski commented 2 years ago

Thanks, @Kalle4242. I looked at the Vahid's project and also your's suggestions in mind and I was able to create first working solution. So, I have manual migration to create virtual tables and used FromSQLRaw for querying them, that seems to be working. BTW, I'm using SQLIteStudio, it displays the letter 'V' when the table is vurtual, it seems that part is missing in DB Broeser.