apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

Intermittent FileNotFoundException for .fnm when using rsync [LUCENE-628] #1703

Closed asfimport closed 14 years ago

asfimport commented 18 years ago

We use Lucene 1.9.1 to create and search indexes for web applications. The application runs in Jboss402 on Redhat ES3. A single Master (Writer) Jboss instance creates and writes the indexes using the compound file format , which is optimised after all updates. These index files are replicated every few hours using rsync, to a number of other application servers (Searchers). The rsync job only runs if there are no lucene lock files present on the Writer. The Searcher servers that receive the replicated files, perform only searches on the index. Up to 60 searches may be performed each minute.

Everything works well most of the time, but we get the following issue on the Searcher servers about 10% of the time. Following an rsync replication one or all of the Searcher server throws

IOException caught when creating and IndexSearcher java.io.FileNotFoundException: /..../_1zm.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.apache.lucene.store.FSIndexInput$Descriptor.<init>(FSDirectory.java:425) at org.apache.lucene.store.FSIndexInput.<init>(FSDirectory.java:434) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324) at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:56) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:144) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:154) at org.apache.lucene.store.Lock$With.run(Lock.java:109) at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)

As we use the compound file format I would not expect .fnm files to be present. When replicating, we do not delete the old .cfs index files as these could still be referenced by old Searcher threads. We do overwrite the segments and deletable files on the Searcher servers.

My thoughts are: Either we are occasionally overwriting a file at the exact time a new searcher is being created, or the lock files are removed from the Writer server before the compaction process is completed, we then replicate a segments file that still references a ghost .fnm file.

I would greatly appreciate any ideas and suggestions to solve this annoying issue.


Migrated from LUCENE-628 by Simon Lorenz, resolved Dec 16 2009 Environment:

Linux RedHat ES3, Jboss402
asfimport commented 18 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

My best guess on what's happening here is, on one of your Searcher boxes:

The one thing that's odd in your traceback above is line 154 of IndexReader.java is only used when there are more than 1 segment in your index. Are you allowing rdist to make a copy after IndexWriter has added docs (and closed) but before optimize is called? Otherwise I can't explain why the index on your Searcher box has more than one segment.

Note that there are two lock files on the Writer machine: the write lock, held for a long time (whenever an IndexWriter is open), and the commit lock, held briefly while a new segments file is written.

I think you need to change your approach to more correctly use Lucene's locking:

Note that the Solr project:

http://incubator.apache.org/solr/features.html http://incubator.apache.org/solr/tutorial.html

has an excellent solution for correctly distributing an index from single Writer to multiple Searhcers (they call it "snaphots"). It also uses rdist to move snapshots around. You might want to try Solr, or perhaps "borrow" it's approach, especially the neat "cp -l -r" trick for quickly creating a snapshot of the index on the Writer machine.

See also this recent thread that touched on similar issues:

http://www.gossamer-threads.com/lists/lucene/java-user/37593

asfimport commented 18 years ago

Simon Lorenz (migrated from JIRA)

Hi Michael,

Many thanks for this input. Your comments are very sound and I will look into your suggestions and report back.

Cheers.

asfimport commented 15 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Hey Simon, anything to report back on this issue? I'd like to close it out if you have worked out what happened.