apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.59k stars 1.01k forks source link

Absorb NIOFSDirectory into FSDirectory [LUCENE-1658] #2732

Closed asfimport closed 15 years ago

asfimport commented 15 years ago

I think whether one uses java.io.* vs java.nio.* or eventually java.nio2.*, or some other means, is an under-the-hood implementation detail of FSDirectory and doesn't merit a whole separate class.

I think FSDirectory should be the core class one uses when one's index is in the filesystem.

So, I'd like to deprecate NIOFSDirectory, absorbing it into FSDirectory, and add a setting "useNIO" to FSDirectory. It should default to "true" for non-Windows OSs, because it gives far better concurrent performance on all platforms but Windows (due to known Sun JRE issue http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734).


Migrated from LUCENE-1658 by Michael McCandless (@mikemccand), resolved Jun 01 2009 Attachments: LUCENE-1658.patch (versions: 3), LUCENE-1658-take2.patch (versions: 2), LUCENE-1658-take3.patch (versions: 7) Linked issues:

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Attached patch. I plan to commit in a day or two.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Mike, maybe not use a boolean for the mode, instead an enum?

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

enum

java 1.5, unless you're going to write it by hand

Mike, are you going to absorb MMapDirectory into FSDirectory as well? It fits "when one's index is in the filesystem".

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

java 1.5, unless you're going to write it by hand

Lucene has the o.a.l.utils.Parameter class for that, whih is used for e.g. Field.Store.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

maybe not use a boolean for the mode, instead an enum?

OK I'll switch to oal.Parameter enum emulator.

Mike, are you going to absorb MMapDirectory into FSDirectory as well? It fits "when one's index is in the filesystem".

Excellent point – I think that makes sense. I'll fold it in as well.

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

Excellent point - I think that makes sense. I'll fold it in as well.

Doesn't make sense to me. :/ MMapD has different characteristics. It can also have it's own configurable properties, which are irrelevant for NIOFSD and FSD.

For example, my version of MMapD uses MappedByteBuffer.load() on creating MMapIndexInput. That way the cost of loading stuff into memory is paid upfront on reopening the reader, instead of during first several searches. If we want to publish this feature into trunk, we'll end up with additional parameter on MMapD constructor, that governs whether we want to preload mmapped files, or not. Now imagine constructor for FSD-that-folds-it-all, which has 'type', and a boolean setting that's relevant only for one of the types.

I also think of trying to use mmap for writing. If that proves to be beneficial, MMapD won't share much with FSD anymore.

In the end, I'm unsure if it's a good idea to fold anything into FSD at all. Too much different stuff in one class becons for spahetti code :) I assume your initial aim is to provide users with best impl for current platform without making them think. What about static factory that chooses between several impls? We had static factory before, it stinked because you had to chose impl through system property, it pooled directories and we had no public constructors. But now we have public constructors :) If one needs, he uses constructor directly. If one is lazy and wants us to choose for him he uses old API, that simply changes semantics a bit.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

In the end, I'm unsure if it's a good idea to fold anything into FSD at all. Too much different stuff in one class becons for spahetti code I assume your initial aim is to provide users with best impl for current platform without making them think.

That is my problem with the whole issue, too. For me, the three directory impl are too different, to be merged together. Especially if writing is also affected.

Using an additional parameter in the ctor specifying the type of directory is almost the same like three classes with much cleaner code. The only thing you do not get is the automatic directory impl choosing for filesystem dirs. But the solution to this are static factories.

What about static factory that chooses between several impls? We had static factory before, it stinked because you had to chose impl through system property, it pooled directories and we had no public constructors. But now we have public constructors If one needs, he uses constructor directly. If one is lazy and wants us to choose for him he uses old API, that simply changes semantics a bit.

That would be great. As the separate public constructors and deprecation of FSDirectory.open() is new in 2.9, we can change it. Maybe an automatism behind open() if the System.property is unset would be good. Advanced users may still instantiate the directories using the public ctor.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In the end, I'm unsure if it's a good idea to fold anything into FSD at all.

OK, you convinced me: let's keep separate classes.

But I'd like to split the current FSDir into an abstract FSDir and a subclass SimpleFSDir. Then, we have three subclasses of FSDir (Simple, NIO, MMap). And strengthen the javadocs so SimpleFSDir's concurrency limitations are clear.

I assume your initial aim is to provide users with best impl for current platform without making them think. What about static factory that chooses between several impls?

Right, that's my goal: there should be a single obvious way to ask for a Directory that uses the file system, and that way should have good defaults.

FSDirectory is unfortunately the obvious way now, but it's usually a poor choice.

For example, my version of MMapD uses MappedByteBuffer.load() on creating MMapIndexInput.

Is this something you could share? Sounds useful! Likewise if you extend MMapDir for writing...

Does anyone have a strong sense of when MMapDir is / is not an appropriate choice? I've seen some users report good performance, eg:

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200603.mbox/%3C20060313025744.18818.qmail@station174.com%3E

MMap eats into the virtual memory of the process, so on 32 bit JRE that obviously a very real concern.

As the separate public constructors and deprecation of FSDirectory.open() is new in 2.9

Right we are free to change things still (hmm: a nice side effect of NOT releasing very often; incentives aren't quite right...).

But the old API was FSDirectory.getDirectory.

I think we should add a static FSDirectory.create() that returns a good default impl given your OS.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Attached patch:

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

Yay for base class + three concrete impls (haven't yet looked at the patch).

I was tempted to choose MMapDir on 64 bit Windows JRE, but it makes me nervous...

Not completely sure why, but MMap failed for me on Win32 some time ago

Is this something you could share? Sounds useful! Likewise if you extend MMapDir for writing...

It's a one-liner. I can make a patch, but while it works better than vanilla MMapD, I'm not yet sure it's the best approach. As for writing - okay, I'll tell you if it turns out to be anything good.

Does anyone have a strong sense of when MMapDir is / is not an appropriate choice?

There are a lot of variables. Just as you said, on 32bit systems you have to take care of address space. So, nice for small indexes, bad for big ones. But, mmap in Java cannot be explicitly closed, it is closed in finalizer, so even for a small index if you update really often, you can hit an OOM even though you have enough memory. MMapD failed for me on windows. But it is fast. It is absolutely, totally, uber-indespensible, if you have a 64bit system, fat index, memory to spare and require lots of fast searches.

So, you can probably enable it for non-Win 64bit?

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Not completely sure why, but MMap failed for me on Win32 some time ago

I think on 32 bit JRE we shouldn't choose MMap in FSDir.open().

It's a one-liner. I can make a patch, but while it works better than vanilla MMapD, I'm not yet sure it's the best approach. As for writing - okay, I'll tell you if it turns out to be anything good.

Which way do you think "prime all bytes up front on open" should default?

There are a lot of variables. Just as you said, on 32bit systems you have to take care of address space. So, nice for small indexes, bad for big ones. But, mmap in Java cannot be explicitly closed, it is closed in finalizer, so even for a small index if you update really often, you can hit an OOM even though you have enough memory. MMapD failed for me on windows. But it is fast. It is absolutely, totally, uber-indespensible, if you have a 64bit system, fat index, memory to spare and require lots of fast searches.

So, you can probably enable it for non-Win 64bit?

Wait, are you saying Win 64 bit has problems w/ MMapDir? (I thought you just said Win 32 bit, above).

Is MMapDir fine w/ concurrency? (I'd assume so). So, if you had the choice (ie, you're running in env w/ plenty of virtual mem), MMapDir would be preferred over NIOFSDir?

On a 64 bit env that doesn't have that much RAM, MMapDir should fare no worse than NIOFSDir, right? Ie both are competing for the same IO cache. And so on a 64 bit JRE perhaps the default should always be MMAPDir.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

From my experiences, MMap is always preferred, if you have 64 bit system. In this case, Java maps the file into address space like a swap file and often used parts are in real memory. So in my opinion, it is always preferable on 64 bit systems. If you have much RAM it is even better (because caching). On Linux/Solaris you can see the used address space with TOP. I have a webserver with an about 7 GB big index using nmap. It is reopened very often, so the used address space sometimes goes up to >50 Gigabytes, but this is not a problem, as it is not real memory, its just like a "swap file". The finalizer removes the mapped adress space fairly fast (dependent on usage/closing of old segements). The index in this directory is really fast, especially, if you have lots of real RAM, that can be used for buffering. You can even load stored fields for thousands of documents very fast, uninversion is also fast. Concurrency is no problem as the mapped file is handled like RAM.

I always recommend users, who want to use lucene: Use 64 bit systems, -d64 JVM parameter (this enables Java in 64 bit on Solaris), a lot of RAM and MMapDirectory.

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

Wait, are you saying Win 64 bit has problems w/ MMapDir? (I thought you just said Win 32 bit, above).

I have no lucene experience with Win64, and so I extrapolated from 32bit (as I felt it was more of a Windows issue than anything else). Would be nice if someone actually tries. If it works, then the rule sounds like - MMap for all 64bit systems, NIO for 32bit non-win, Simple for 32bit win.

Is MMapDir fine w/ concurrency?

It's cool with it if you have enough memory (no frequent paging occurs). I'm not sure about NIOFS vs MMap on memory-constrained systems, if you're competing for disk cache.

Which way do you think "prime all bytes up front on open" should default?

This has a side-effect of pushing previous mmaps out of memory if you're memory-constrained. Thus, under certain usage conditions (frequent merging, or something like that + low memory) this feature theoretically could be a disadvantage. For me it works well enough to be hardcoded to true :)

If anybody's interested, I can also repost an alternative for MMapD - MemoryMirroredD, which wraps any given D and explicitly preloads files in non-chunked byte arrays. It's a bit faster than MMapD and MemoryD (for reading), but has certain disadvantages, like stressing your GC and throwing OOM on merges/optimize if you don't have enough heap (unlike MMapD that silently swaps out).

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK so I think FSDir.open should default to MMapDir on 64 bit machine, NIOFSDir on non-windows machines, and SimpleFSDir on windows 32 bit machines. I'll rev the patch.

Next question: does anyone know how to reliably determine if the running JRE is 32 or 64 bit? I found this:

http://forums.java.net/jive/message.jspa?messageID=274406

but I'm worried about how portable that solution is...

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Attached patch w/ new defaults. I plan to commit in a day or two.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This issue has some more problems after committing. And there are also inconsistencies. I am working on a patch, that also un-deprecates FSDir.getDirectory() and removes FSDir.open() again. FSDir.getDirectory() will return the platform-optimal directory, if no system property specifying the directory is set. Also removes some duplicate code in the FSDir.IndexInput/Output by subclassing from SimpleFSDir.IndexInput (but mark deprecated). My patch will remove the whole FSDir caching (so getDirectory will always return a new directory) and thus again #2527 comes in my mind: #2527 currently has the problem, that it only works correct with FSDir.getDirectory() instances, but not with self-instantiated implementations.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This issue breaks some platforms (if FSDirectory.open() is used in some tests)!

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This is a revised patch (on current trunk):

Some tests currently fail:

TODO:

This patch also contains an overflow fix in MMapDir for files <2 GB but large. "int*int" should be written as "(long)int*int", if the result maybe large (thanks Earwin!).

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

One comment: The part with the reflection and the ctor may be non-backwards compatible, because if somebody overwrites the old FSDir and does not provide the wanted ctor (only the default one), it would fail. The default ctor would not even work, as the init() private internal method (used by the old getDirectory()) is no longer available. But this is the same like in the discussion about backwards compatibility. In my opinion, for 2.9, we should make FSDir abstract, too. It is sooooooo seldom, that somebody may create a file based dir impl (nio.2 or what else? Or Earwins MemoryCopiedDir). But if somebody does this, he knows what he does and can fix the compilation errors easily!

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

New patch: Now only TestCompoundFile fails, some more imporvements.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Ugh, I'm sorry... I had run the tests only on OS X, Java 1.5, which is not 64 bit by default (must specify d64) - I should have tested the different OS's and JREs before committing.

Thanks for jumping on this, Uwe and Earwin!

FSDir.open() was removed, the logic was included into FSDir.getDirectory and this un-deprectated: If the system property is missing, the same like in Mikes open() happens: choosing the best impl for platform

But: the removal of the cache is not back-compatible? (I'm not sure how/whether anyone relies on that behavior). And, we are wanting to move away from that global System property in choosing the class for your FSDir. This is why I switched to open instead of back to getDirectory.

Caching of FSDirs was completely removed

I think this must wait until 3.0, ie when we remove all getDirectory methods.

FSDir.IndexInput/Output (deprected) was removed (suplicate code) and simply replaced by (deprecated) subclasses of SimpleFSDir ones). This is OK for backwards compatibility.

Excellent!

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Caching of FSDirs was completely removed

I think this must wait until 3.0, ie when we remove all getDirectory methods.

I am currently playing with this. For the end-user it is not really needed, that directories are cached. Even if he gets a directory that it is cached, he can see it as a alone one. he can close it (because of refcounting and so on) and do everything, he can do with a single directory, too.

So why is it not backwards compatible, when we remove caching? All tests pass here!

More, the mixing of cached and uncached dirs bring more problems (I am currently investigating).

I will supply a new patch shortly, with all other bugs fixed (there were more, e.g. with some tests in _TestHelper and so on). The move from FSDir to SimpleFSDir is more complicated than it seems. In my opinion, it would be a question, if this move should wait until 3.0.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I am currently playing with this. For the end-user it is not really needed, that directories are cached. Even if he gets a directory that it is cached, he can see it as a alone one. he can close it (because of refcounting and so on) and do everything, he can do with a single directory, too.

The original purpose of the cache was to ensure each unique directory in the filesystem alway mapped to a single instance of FSDir, so that you could synchronize on that instance and be certain that this is equivalent to synchronizing access to that underlying filesystem directory.

Lucene had relied on this at one point, but no longer does. I'm not sure if any apps out there still rely on this, so it's dangerous to simply remove it, especially when we have another option (using a new method "open") that won't break such apps.

More, the mixing of cached and uncached dirs bring more problems (I am currently investigating).

We should get to the bottom of this, but these problems are pre-existing to this issue, right? (One could already directly instantiate each directory).

The move from FSDir to SimpleFSDir is more complicated than it seems. In my opinion, it would be a question, if this move should wait until 3.0.

As long as we preserve the old getDirectory, back-compatible, this change should have no impact on back-compatibility.

Ie, it's only if you use the new FSDir.open() API that you get the new behavior. I intentionally went and fixed tests to use FSDir.open so that we stress the new functionality, which then led us to discover tests making invalid assumptions, which we should then fix.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

bq The original purpose of the cache was to ensure each unique directory in the filesystem alway mapped to a single instance of FSDir, so that you could synchronize on that instance and be certain that this is equivalent to synchronizing access to that underlying filesystem directory.

In my opinion, the cached directories vs. instantiated directories have one big advantage: They are forced to use the same locking mechanism. So if somebody creates a directory using one LockFactory, writes to the index and in a parallel thread uses another locking mechanism with a separate dir instance, he corrupts his index. So from that point of view, only have one directory instance per resource is a good thing (it does not work from different JVM processes, sure).

Ie, it's only if you use the new FSDir.open() API that you get the new behavior. I intentionally went and fixed tests to use FSDir.open so that we stress the new functionality, which then led us to discover tests making invalid assumptions, which we should then fix.

This is correct. For unit testing, I found out now, that it is much simplier to check, if all tests would also work with other platforms, if you set the FSDir system property when running the tests. With open() this is currently not possible.

Maybe I un-comment-out the caching again, but let getDirectory still use the new behaviour, if the system property is not set. We could then in 3.0 just remove the caching, but let getDirectory() alive. I am not sure.

In my opinion, this is not really a more serious bw-change than a small behaviour change, that can be written into CHANGES.txt. We have more serious ones.

I would strongly tend to remove the cache at all and write a warning into CHANGES.txt.

At all, I do not really think anybody has implemented an own subclass of FSDir. The current patch's bw-change is more, that the protected no-arg ctors no longer exist and are no longer used.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In my opinion, the cached directories vs. instantiated directories have one big advantage: They are forced to use the same locking mechanism. So if somebody creates a directory using one LockFactory, writes to the index and in a parallel thread uses another locking mechanism with a separate dir instance, he corrupts his index. So from that point of view, only have one directory instance per resource is a good thing (it does not work from different JVM processes, sure).

I agree.

But, I don't think this is a strong enough reason for Lucene to be doing such magic under-the-hood, going forward. This magic leads to other problems (like LUCENE-1453).

> Ie, it's only if you use the new FSDir.open() API that you get the new behavior. I intentionally went and fixed tests to use FSDir.open so that we stress the new functionality, which then led us to discover tests making invalid assumptions, which we should then fix.

This is correct. For unit testing, I found out now, that it is much simplier to check, if all tests would also work with other platforms, if you set the FSDir system property when running the tests. With open() this is currently not possible.

Excellent!

Before committing we should confirm all tests pass if we temporarily hardwire open to return each of the 3 FSDir impls.

But I don't think this is reason enough to leave the global system property in place for real usage of Lucene.

Maybe I un-comment-out the caching again, but let getDirectory still use the new behaviour, if the system property is not set. We could then in 3.0 just remove the caching, but let getDirectory() alive. I am not sure.

But you've still unnecessarily broken back-compat with that. By making a new method (open), which does neither the magic singleton caching nor the global system prop, back-compat users are guaranteed to see no changes.

In my opinion, this is not really a more serious bw-change than a small behaviour change, that can be written into CHANGES.txt. We have more serious ones.

I would strongly tend to remove the cache at all and write a warning into CHANGES.txt.

At all, I do not really think anybody has implemented an own subclass of FSDir. The current patch's bw-change is more, that the protected no-arg ctors no longer exist and are no longer used.

Why take that chance unnecessarily? What are we gaining by changing getDirectory so much in place, vs switching to a new (open) API? It's entirely possible apps have subclassed FSDir, rely on the singleton cache and rely on the global system prop. Making a new API, and deprecating in favor of that, won't affect back-compat users at all.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I am not so happy :), but I work on it.

Currently my problem is more a failure for the test in #2527. After reopening a DirectoryIndexReader, sometimes (not always) the directory is closed, even with the 1453-workaround.

This failure depends on the random seed in testFSDirectoryReopen2() and is reproducible also with the current version of FSDirectory. It seems that 1453 is not completely fixed. I am looking now since 5 hourcs and cannot find the error, the code consists of a lot of printlns and so on. :(

When I found out, what the problem is, I will perhaps open an additional issue, but it seems, that the problem has to do with the FSDirectory changes. Somewhere the directory is closed, although it should be stay open. But what I can say: The error is simplier to reproduce, if the directory is not cached!

I have a lot of other fixes for the failing tests on some platforms, I will post a revised patch some time with getDirectory deprecated and open() again.

One thing: For investigating this bug, I changed the IndexReader.open() methods to use a non-cached directory. I think we should do this in the final version, too (also for Indexwriter). So also replace FSDir.getDirectory() by open for all these ctors and opens() that get a String with the index directory. If we do not want to do this, I suggest to deprecate all these methods and tell the user to open the directory themselfes and pass Directory instances to Writer/Reader. This would help cleaning up the code immense later, because we can remove all these closeDirectory checks/pass-throughs everywhere, which are silly... This makes the code completely not-understandable. In my opinion, one should open/close the directory himself and close it after usage.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I am not so happy , but I work on it.

I am looking now since 5 hourcs and cannot find the error, the code consists of a lot of printlns and so on.

We can tag-team if you want :) Let me know if you want me to take over...

This failure depends on the random seed in testFSDirectoryReopen2() and is reproducible also with the current version of FSDirectory. It seems that 1453 is not completely fixed.

Yikes – you mean there's a pre-existing issue with the fix in #2527, before this issue was committed? I agree that code is very spooky, and I'll be very happy once we remove (in 3.0) the cache/refCounting done by FSDir.getDirectory....

If we do not want to do this, I suggest to deprecate all these methods and tell the user to open the directory themselfes and pass Directory instances to Writer/Reader

I like this (deprecating the open methods that take File/String) best!

I have a lot of other fixes for the failing tests on some platforms, I will post a revised patch some time with getDirectory deprecated and open() again.

OK thanks.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Yikes - you mean there's a pre-existing issue with the fix in #2527, before this issue was committed? I agree that code is very spooky, and I'll be very happy once we remove (in 3.0) the cache/refCounting done by FSDir.getDirectory....

The problem is not really the caching/refcounting. The reopen code sometimes seems to close the directory, even than it should not. There is no difference if the directory is cached (and the number of opens() is tracked by refCount) or standalone. The problem is that SegmentReader's readNorms() hits AlreadyClosedException then...

The interesting thing is that ou can more easy reproduce it with stand-alone dirs (because I removed the caching, you know). The 1453 fix is also needed for stand-alone dirs (so everytime codeDirectory=true), because, if they are closed during an error/whatever, the same happens. It does not happen so easy with cached dirs, because the old reader is normally closed after the new reopened one, so during the reopen, the refcount is big enough.

In my opinion, the only good solution is to remove the whole directory-close stuff from readers/writers in 3.0, as said before. And this can be done by removing the string-type dir arguments from Reader/Writer. And: As these use currently FSDir.getDirectory() they should be changed to FSDir.open() or deprecated, I tend to the last one.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

To conclude: The error, I found is only relevant, if the string-type dir arguments to IndexReader are passed to FSDir.open() and are separate instances without refcounting. The reopen code sometimes closes the directory, although it should not, which leads to problems, if the directories are separate instances. With cached dirs, its no problem. So if we deprecate the string directory arguments and inside that methods let FSDir.getDirectory() stay alive, we have no problem. In 3.0, one could then remove all this code like refcounting/changing dirs and the everywhere arguments closeDirectory in SegmentReader/DirectoryIndexReader/MultiSegmentReader/... Then 1453 is completely solved.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This patch fixes the failing tests and contains the other improvements.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Previous patch missed correct ctor for SimpleFSDir for reflection.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks good, thanks Uwe!

I still see failures when I use MMapDir (some tests are assuming they'll always get an FSDir).

I can tackle these if you haven't already?

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I think I forgot them when I reverted my changes. Currently I am running the tests again with MMap and after that with NIO.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Oh shit, on windows, a very lot of tests are failing with MMap. Windows says, you cannot delete or modify files, which have a mapping, with Solaris it works without problems: e.g. TestAtomicUpdates: [junit] C:\Projects\lucene\trunk\build\test\19.cfs_12.cfs (Der Vorgang ist bei einer Datei mit einem geöffneten Bereich, der einem Benutzer zugeordnet ist, nicht anwendbar)

This is the problem on windows: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038

The problem is, that a memory mapped area is not released on close() in Java, it is released, when GC frees it. So the problem may be that MMap can only be used for reading indexes (so only read-only IndexReaders) on windows.

What should we do, again disable MMap and only use NIOFSDir?

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

a very lot of tests are failing with MMap.

It looks like all but one (see below) of the failures are "innocent", eg the test hits an exception while cleaning up (removing the index directory it had created), or the test is trying to overwrite a file.

I agree this will be a hassle for normal usage of Lucene, so let's change the default on Win64 to SimpleFSDir (can't be NIOFSDir because of another Sun bug). So, for open(), on non-Windows 32 bit we use NIOFSDir, on non-Windows 64 bit we use MMapDir, and on Windows 32 or 64 we use SimpleFSDir. Crazy how many challenges there are with IO on Windows from Java...

This is the problem on windows: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038

Sheesh that's alot of votes!

Can you add that bug into the toplevel javadocs of FSDir explaining why MMapDir is not a great choice on even Win64?

There was one odd failure I noticed, on Win64 when running tests from a mounted remote (CIFS) drive:

    [junit] Testcase: testIndexAndMerge(org.apache.lucene.index.TestDoc):   FAILED
    [junit] junit.framework.AssertionFailedError:
    [junit]     at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:249)
    [junit]     at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:350)
    [junit]     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
    [junit]     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
    [junit]     at org.apache.lucene.index.TestDoc.merge(TestDoc.java:182)
    [junit]     at org.apache.lucene.index.TestDoc.testIndexAndMerge(TestDoc.java:117)
    [junit]     at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)

I dug down and found that this test illegally opens SegmentReaders on files that IndexWriter still has open for writing, and somehow this causes problems when using an MMapDir. I'll open a separate issue and put details there.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I dug down and found that this test illegally opens SegmentReaders on files that IndexWriter still has open for writing, and somehow this causes problems when using an MMapDir. I'll open a separate issue and

put details there.

This is a NIO bug in windows, I assume. In Google I found a report at sun about this, too. Mapped buffers from UNC-pathes have wrong bytes in their buffer.

By the way, the other failing tests are easy to fix: Some tests check, if the IndexInput throws an IOException when reading past eof. When doing this with a Byte buffer, the get() throws an BufferUnderflowException. It can be fixed like this in MMapIndexInputs:

    public byte readByte() throws IOException {
      try {
        return buffer.get();
      } catch (BufferUnderflowException e) {
        throw new IOException("read past eof");
      }
    }

The other failures are harmless, but it would be good to fix them. I am working on that and then test extensive.

The problem with not freeing the buffer can be fixed on windows using the bad hack with this sun.misc.Cleaner class and PrivilegedAction (described in the bug report), but this depends on Sun's internals and works only with Sun's JRE. And it may fail on some under-priviledged environments like web containers.

But nevertheless, with this bad hack, my local version works now without any failing test on Win32 using MMap.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Atached is a patch that now works (at least on windows) for all three versions of FSDir. One additional test was fixed to SimpleFSDir (because it assumesan BufferedIndexInput).

This patch contains the tweaked MMapDir that has the following features:

The problems with unmapping are: It may fail on specific non-Sun VMs and may hit SecurityExceptions. If this happens, the close() call will throw an IOException. The good thing is: The virtual memory usage is lower and with small indexes, the 32 bit VMs do not hit OOMs, if buffers are not unmapped by GC early.

What do you think? Should we supply this "extended" MMapDirectory? Earwin, did you try this, too?

All tests pass now :-) JUHU!

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Some cleanups with initOutput (remove duplicate code) and unneeded methods in MMapDir. All tests pass. Will try tomorrow also on Solaris x64 with MMap.

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

bq.All tests pass now JUHU!

Excellent!

It may fail on specific non-Sun VMs and may hit SecurityExceptions. If this happens, the close() call will throw an IOException.

I think we should do the hack. It seems better than nothing.

But: I think we shouldn't throw an exception if the hack fails? Ie, just fallback to relying on GC to eventually unmap? (What MMapDir does today).

And then we can make the default on Windows 64 be MMapDir again?

Also, can you reference this bug from FSDir's/MMapDir's javadocs?

asfimport commented 15 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm seeing this failure:

[junit] Testcase: testIndexInputMethods(org.apache.lucene.store.TestMMapDirectory): FAILED [junit] FSDirectory has method public org.apache.lucene.store.IndexInput org.apache.lucene.store.FSDirectory.openInput(java.lang.String) throws java.io.IOException but MMapDirectory does not override [junit] junit.framework.AssertionFailedError: FSDirectory has method public org.apache.lucene.store.IndexInput org.apache.lucene.store.FSDirectory.openInput(java.lang.String) throws java.io.IOException but MMapDirectory does not override [junit] at org.apache.lucene.store.TestMMapDirectory.testIndexInputMethods(TestMMapDirectory.java:43) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)

I think we should fix the test to make an exception...

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I think we should fix the test to make an exception...

I removed the test completely here, but forgot to include this into the patch. The test is now invalid with the new class hierarchy of FSDir. If FSDir gets abstract in 3.0, it is clear which methods need to be overridden and so on. The intention behind the test was, to check, if MMap does not overwrite all FSDir Methods, and because of this one of the calls could falsely return the standard FSDir impl of IndexInput. After Lucene 3.0, this is not possible, as MMapDir does not extend SimpleFSDir.

If we remove the test and then backwards-tag tests fail because of this, we could easily fix this by adding the method, simply calling super(). Or remove the test in backwards-tag, too.

But: I think we shouldn't throw an exception if the hack fails? Ie, just fallback to relying on GC to eventually unmap? (What MMapDir does today).

And then we can make the default on Windows 64 be MMapDir again?

Also, can you reference this bug from FSDir's/MMapDir's javadocs?

Good idea. The problem is, if the hack fails, IndexReader may suddenly fail later on windows to delete files. Or was this only a problem in the tests. What happens, if e.g. IndexWriter wants to write a new segments.gen file and so on, but it is still mmapped? So I preferred to throw the IOException, but I can disable this easily.

I would suggest to check on windows, if class sun.misc.Cleaner is available and if not, return SimpleFSDir. For Unix, it is enough to simply ignore the cleanup (throw no exception).

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

I told you, Java mmap doesn't work on Windows. And please, don't use the unmap hack! If it doesn't work, it doesn't work. Let's for all windows versions use SimpleFSD. Look, what are you going to do if you unmap a buffer and then access it by accident? Crash JVM?

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

The buffer is nulled directly after unmapping.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here the patch with conditionalized MMapDirectory on windows. I am a bit nervous... We should discuss it a little bit and commit it. But everybody should test it extensive before releasing! Users still using FSDir.getDirectory would not be affected, so it would not break existing apps.

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

The buffer is nulled directly after unmapping.

Really? Let me quote some code (MacOS, Java 1.6):

unsafe.freeMemory(address); address = 0; Bits.unreserveMemory(capacity);

Does windows version differ? What we see here is 'zeroing', not 'nulling'. When doing get/set, buffer never checks for address to have sense, so the next access will yield a GPF :) The guys from Sun explained the absence of unmap() in the original design - the only way of closing mapped buffer and not getting unpredictable behaviour is to introduce a synchronized isClosed check on each read/write operation, which kills the performance even if the sync method used is just a volatile variable.

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Updated patch: I forgot to add a check, if the IndexInput was already closed.

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

Ah! You was referring to your code. It's not thread-safe still. Someone could access the closed buffer before it sees the now-null reference to it. You also employ the hack on non-windows machines, that work quite well without it. What for?

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I know this code :-), its on every platform the same. For our MMapDirectory this is not a problem, as nobody can accidently read/write the buffer, because the reference to it is nulled in our code. The buffer member itsself is private, so nobody can access it from outside. So after closing the IndexInput, nothing can access the buffer. This is what I meant with "nulling".

asfimport commented 15 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

You also employ the hack on non-windows machines, that work quite well without it. What for?

It uses less virtual memory :-)

Ah! You was referring to your code. It's not thread-safe still. Someone could access the closed buffer before it sees the now-null reference to it.

For the thread-safety, youre correct. To solve this, we have to add the synchronized isClosed check on our side, which kills performance. I wanted to check, what happens, if somebody really accesses the buffer after unmapping, does it crash the JVM?

asfimport commented 15 years ago

Earwin Burrfoot (migrated from JIRA)

I tested on MacOS:

Invalid memory access of location 8b55a000 rip=0110c367