apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

The creation of a spell index from a LuceneDictionary via SpellChecker.indexDictionary (Dictionary dict) fails starting with 1.9.1 (up to current svn version) [LUCENE-632] #1707

Closed asfimport closed 18 years ago

asfimport commented 18 years ago

Two different errors in 1.9.1/2.0.0 and current svn version.

1.9.1/2.0.0: at the end of indexDictionary (Dictionary dict) the IndexReader-instance reader is closed. This causes a NullpointerException because reader has not been initialized before (neither in that method nor in the constructor). Uncommenting this line (reader.close()) seems to resolve that issue.

current svn: the constructor tries to create an IndexSearcher-instance for the specified path; as there is no index in that path - it is not created yet - an exception is thrown.


Migrated from LUCENE-632 by Karsten Dello, resolved Sep 18 2006 Attachments: lazy_searcher.diff

asfimport commented 18 years ago

Karl Wettin (migrated from JIRA)

I think you use it the wrong way.

Please post some code showing what you do.

asfimport commented 18 years ago

Karsten Dello (migrated from JIRA)

Here is the code. It is copied from http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=2#generating_spell_index

I am pretty sure this code worked with 1.4.3?

package de.kobv.lucene.spellcheck;

import java.io.File; import java.io.IOException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.spell.Dictionary; import org.apache.lucene.search.spell.LuceneDictionary; import org.apache.lucene.search.spell.SpellChecker; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory;

public class SpellCheckIndexer {

public final static String LUCENEFIELD="AlleFelder";

public static void main(String [] args) {

    if (args.length!=2) {
        System.out.println("usage: java de.kobv.lucene.spellcheck.SpellCheckIndexer <sourceIndexDirectory> <spellCheckIndexDirectory>");
        return;
    }
    String indexDir=args[0];
    String spellcheckpath=args[1];

    try {

    SpellCheckIndexer sci= new SpellCheckIndexer();

    System.out.println("creating spell check index at "+new java.util.Date(System.currentTimeMillis()));
    System.out.println("indexDir:"indexDir" spellcheckdir:"+spellcheckpath);

    Directory d1 = FSDirectory.getDirectory(indexDir,false);

    // make sure the directory exists
    File f= new File(spellcheckpath);
    f.mkdir();

    Directory d2= FSDirectory.getDirectory(spellcheckpath,true);

    sci.createSpellIndex(LUCENEFIELD,d1,d2);

    System.out.println("finished at"+new java.util.Date(System.currentTimeMillis()));

    }
    catch(Exception e) {e.printStackTrace(System.out);}
}

public void createSpellIndex(String field,
        Directory originalIndexDirectory,
        Directory spellIndexDirectory) throws IOException {

    System.out.println(""field" - "originalIndexDirectory" - "+spellIndexDirectory);

    IndexReader indexReader = null;
    try {
        indexReader = IndexReader.open(originalIndexDirectory);
        System.out.println("Index mit "indexReader.numDocs()" docs in "originalIndexDirectory" geoeffnet.");

        Dictionary dictionary = new LuceneDictionary(indexReader, field);

        SpellChecker spellChecker = new SpellChecker(spellIndexDirectory);
        spellChecker.indexDictionary(dictionary);

    } 
    catch (Exception e) {
        e.printStackTrace();
    }

    finally {
        if (indexReader != null) {
            indexReader.close();
        }
    }
}

}

asfimport commented 18 years ago

Karl Wettin (migrated from JIRA)

Had no problems running the code on 2.0.0, but on the SVN version. Didn't look that close to why. I have always passed an existing index to the SpellChecker, and would say that the implemention you pasted is corrupt, not the API.

Directory d2= FSDirectory.getDirectory(spellcheckpath,true); 

And when the index has been created you have to:

spellChecker.setSpellIndex(d2);

To ensure a new Searcher is created.

Or use this patch that handle the searcher lazy. And then there is no need for you to make any changes to your code.

asfimport commented 18 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

I put the new IndexSearcher call to the setSpellIndex(Dictionary) method, which is called from the SpellChecker constructor. I did that the other day, so maybe I broke something, although everything worked for me and I didn't get any NPEs.

I use code that looks like this:

public void createDictionary(String field, Directory sourceIndex, Directory spellIndex) throws IOException {
    IndexReader reader = null;
    try {
        reader = IndexReader.open(sourceIndex);
        Dictionary dictionary = new LuceneDictionary(reader, field);
        SpellChecker spellChecker = new SpellChecker(spellIndex);
        spellChecker.indexDictionary(dictionary);
    } finally {
        if (reader != null) {
            reader.close();
        }
    }
}

public static void main(String[] args) throws IOException {
    if (args.length <3) {
        System.err.println("Usage: java " + SpellerIndexer.class
                + " <source field&gt; &lt;source index&gt; &lt;spell index&gt;");
        System.exit(1);
    }

    System.out.print("\nCreating spell checker index in " + args[2] + " ... ");
    SpellerIndexer indexer = new SpellerIndexer();
    Directory sourceIndex = FSDirectory.getDirectory(args[1], false);
    Directory spellIndex = FSDirectory.getDirectory(args[2], IndexReader.indexExists(args[2]));
    indexer.createDictionary(args[0], sourceIndex, spellIndex);
    System.out.println("done\n");
}

Karsten: If you still think there is a bug in the SVN version, please comment here. I can apply the lazy searcher patch from Karl (see Karl, you don't always get ignored! :)), but I'm not sure that it will make a difference. If you apply it and it makes a difference for you, please follow up here.

asfimport commented 18 years ago

Miles Barr (migrated from JIRA)

I've checked svn and it's not an issue in trunk, but it's definitely a bug in the version that's distributed with Lucene 1.9.1

The reader is only opened in the 'exist' method, but at the end of the 'indexDictionary' method it tries to close the reader, then sets it to null. I think the intent of the code is to make sure we don't have an old reader after we update the dictionary, but if we're creating the dictionary for the first time we get a NPE.

The simple fix is to check if reader is null before trying to close it.

asfimport commented 18 years ago

Karsten Dello (migrated from JIRA)

Sorry for not responding for such a long time, I have been out of the office.

Otis: The current SVN version (as of today) works fine for me, though the spellIndex has to be created manually before using the SpellChecker constructor. As Karl pointed out a simple new IndexWriter(d2, null, true).close(); does the job.

Miles: I think you are right, had the same problem. I worked around that problem by calling exist("foo") before indexDictionary , but that is not a bugfix (which is, as you said, that the method should check if reader is null)

asfimport commented 18 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

If I understood all the comments correctly, there is no bug in HEAD. If I misunderstood, feel free to re-open.

asfimport commented 17 years ago

f (migrated from JIRA)

i still have the problem with null pointer exceptions on creating a spell index ... first i thought its a problem with the php jni bridge i use. But after i created a java example it didnt work either. For preventing parametre missspellings or something like this i hardcoded the values in the example.

import java.io.IOException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.search.spell.Dictionary; import org.apache.lucene.search.spell.LuceneDictionary; import org.apache.lucene.search.spell.SpellChecker; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory;

public class DidYouMeanIndexer {

private static final String DEFAULT_FIELD = "txt_name";
public void createSpellIndex() throws IOException
{
    IndexReader indexReader = null;
    try
    {   boolean create = false;
        Directory orgindex = FSDirectory.getDirectory("/var/www/localhost/htdocs/lib/suche/data/index/", create);
        Directory spellindex = FSDirectory.getDirectory("/var/www/localhost/htdocs/lib/suche/data/didyoumean/", create);
        indexReader = IndexReader.open(orgindex);
        new IndexWriter(orgindex, null, true).close();
        Dictionary dictionary = new LuceneDictionary(indexReader, "txt_name");
        SpellChecker spellChecker = new SpellChecker(spellindex);
        spellChecker.indexDictionary(dictionary);
    }
    finally
    {   if (indexReader != null)
        {   indexReader.close();
        }
    }
}

public static void main(String[] args)
{
    DidYouMeanIndexer obj = new DidYouMeanIndexer();
    try
    {     obj.createSpellIndex();
    }
    catch(IOException exception)
    {
    }
}

}

The next step i did was debugging the behaviour of the worditerator which i got from the LuceneDictionary. But it didnt get a word. The index is well set up with one document in it with two fields. The txt_name field is tokened indexed and has a termvector. The solution of Karl Wettin didnt the job for me ;)

This problem exists on my system with lucene 1.9.1 and with 2.0.1-dev

I hope you have some ideas ;) scnr

asfimport commented 17 years ago

f (migrated from JIRA)

i solved the problem too with commenting the reader.close(). And the second problem was that in the LuceneDictionary the comparison between tfield and field didnt match. by rewriting that part all seems to do its job.