chriseldredge / Lucene.Net.Linq

LINQ provider to run native queries on a Lucene.Net index
Other
151 stars 66 forks source link

Simple Search not Working #57

Closed jehugaleahsa closed 10 years ago

jehugaleahsa commented 10 years ago

I have been trying to get a simple example working using the Fluent code. I have a simple Account class with two properties: AccountId and AccountName.

        public class Account
        {
            public int AccountId { get; set; }

            public string AccountName { get; set; }
        }

I am creating a directory in memory, adding two accounts and then searching for them. I am noticing that having a space in the AccountName is breaking the search. Based on some of your examples, I can't see why this isn't working. Could you give me a little insight?

        var version = Lucene.Net.Util.Version.LUCENE_30;
        var mapping = new ClassMap<Account>(version);
        mapping.Key(a => a.AccountId).AsNumericField();
        mapping.Property(a => a.AccountName).WithTermVector.Yes();

        var directory = new RAMDirectory();
        var provider = new LuceneDataProvider(directory, version);
        provider.Settings.EnableMultipleEntities = false;

        using (var session = provider.OpenSession(mapping.ToDocumentMapper()))
        {
            var account1 = new Account() { AccountId = 1, AccountName = "test account", };
            var account2 = new Account() { AccountId = 2, AccountName = "account test", };
            session.Add(account1, account2);
        }

        var accounts = from account in provider.AsQueryable<Account>(mapping.ToDocumentMapper())
                       where account.AccountName == "test"
                       orderby account.Score()
                       select account;

        foreach (var account in accounts)
        {
            Console.Out.WriteLine(account.AccountName);
        }
mattjohnsonpint commented 10 years ago

This might be the same issue as #55

chriseldredge commented 10 years ago

The default analyzer if none is specified is LowercaseKeywordAnalyzer, which does not tokenize on white space and instead treats the entire text as a single token (after converting to lower case).

To search for partial matches, the property should be specified as:

mapping.Property(a => a.AccountName).AnalyzeWith(new StandardAnalyzer(version))

Enabling TermVector does not change which analyzer is used by itself, nor how that analyzer will tokenize a stream.

Note you could also use a stemming analyzer or any other analyzer that tokenizes in place of StandardAnalyzer.

However, as @mj1856 noted even if you specify a tokenizing analyzer, #55 prevents it from working. That bug is now fixed although unreleased.

chriseldredge commented 10 years ago

Duplicate of #55.

jehugaleahsa commented 10 years ago

Oh man. I went and tried a bunch of other analyzers thinking StandardAnalyzer was the default. It never occurred to me that I should try being explicit.

I decided to just go directly against the lucene.net library for now. I will have to try your library out again next time I'm using Lucene. At least now I am more familiar with the lucene library.

I noticed that I couldn't find the PorterStemAnalyzer class referenced in your Fluent sample. I wasn't sure if that was in a newer version or if it was something from a personal project. If publishing NuGet packages is being a pain I could send you a .bat file I use for automating the deployment of my NuGet packages.

chriseldredge commented 10 years ago

Yeah, Lucene.Net includes PorterStemFilter, but not an analyzer that wires it up. I guess there are enough knobs you would want to adjust that they don't include a default implementation but I'm not sure.

An example analyzer that uses it can be found at https://github.com/themotleyfool/NuGet.Lucene/blob/master/source/NuGet.Lucene/PorterStemAnalyzer.cs