bleroy / lunr-core

A port of LUNR.js to .NET Core
MIT License
559 stars 24 forks source link

Result are truncated[BUG] #52

Closed shivamn closed 1 year ago

shivamn commented 1 year ago

var index = await Lunr.Index.Build(async builder => { builder .AddField("country") .AddField("state") .AddField("district") .AddField("city") .AddField("area") .AddField("postalcode");

await builder.Add(new Document
            {
                { "country", "india" },
                { "state", "karnataka" },
                { "district", "" },
                { "city", "Bangalore" },
                {"area","Malleshwaram" },
                { "postalcode","560003" },
                { "id", "1" }
            });

await builder.Add(new Document
            {
                { "country", "india" },
                { "state", "karnataka" },
                { "district", "" },
                { "city", "Bangalore" },
                {"area","Jayanagar" },
                { "postalcode","560008" },
                { "id", "2" }
            });

});

await foreach (Result result in index.Search(" Malleshwaram, bangalore india")) {

// match data contains only "bangalor " insted of "bangalore" }

bleroy commented 1 year ago

"bangalor" is the key in the inverted index, not the text for the match. Because it's an inverted index entry, what you get is the stemmed version of the term, not the matched text.

In order to get the full data for the matches, you need to add a little incantation to your builder code, right after defining the fields:

builder.MetadataAllowList.Add("position");

Once you've done that, you'll see that results have additional positional data for the match: image

What this is showing you is that you have a match on the "city" field at position 0, with a length of 9. That accurately gives you "Bangalore" if you fetch the document and take the relevant substring (which happens to be the full string in this case, but you see the idea for more elaborate documents...)