apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.03k forks source link

Allow Directory.copy() to accept a collection of file names to be copied [LUCENE-2339] #3415

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory.


Migrated from LUCENE-2339 by Earwin Burrfoot, resolved Mar 26 2010 Attachments: LUCENE-2339.patch (versions: 5)

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

A simple patch

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks good Earwin – simple addition. I'll commit later today... thanks!

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

I wonder if we could convert these methods from static to members, so it becomes 'from.copyTo(to, files)' ? This opens up the possibility to override.. and.. hmm.. optimize somehow, if 'to' is of the same type as 'this'. I believe you can rig some nio-based file copier that bypasses Java completely, so you don't have to pass multi-gb indexes through your heap when, say, backing up.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I wonder if we could convert these methods from static to members, so it becomes 'from.copyTo(to, files)' ?

I like this approach.

I believe you can rig some nio-based file copier that bypasses Java completely, so you don't have to pass multi-gb indexes through your heap when, say, backing up.

NIO's transferTo, right?

For backups I wonder if we should make a copyTo/From that takes an IndexCommit... but maybe that's going too far. EG you'd also want it to be incremental (only copy new files, maybe delete no longer referenced ones, etc.).

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

NIO's transferTo, right?

I didn't remember this at the point of writing, but yes, transferTo. Google says that with certain restrictions on the size of the chunk transferred at once (eg. one huge chunk for linux, 64mb-something chunks for windows), this works crossplatform. NIO2@Java7 has an even more simple Path.copyTo() method, so happy 7 users can use this.

For backups I wonder if we should make a copyTo/From that takes an IndexCommit

Ohmigosh, no! : } You can get file list from IC and feed it into copy(). More power with less API surface.

I'm going to add two methods copyTo(target) and copyTo(target, filenames), rewrite copy(source, target, close) to use these and deprecate it. There's no point in keeping around slightly different versions of the same thing, and I believe people can easily close their source dirs by themselves, what a stupid option.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK this sounds like a good plan!

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

Google says that with certain restrictions on the size of the chunk transferred at once (eg. one huge chunk for linux, 64mb-something chunks for windows), this works crossplatform.

Heh. The bug existed only on 1.4 and was fixed. So I guess it works everywhere.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

Patch with overridable copyTo(), based off trunk+LUCENE-2328 (uses newly introduced method)

Optimized FSDir->FSDir case. Should I do special case for RAMDirs? :)

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch looks good! Few comments:

  1. is it safe to use NIO for all FSDirs? I thought that on Windows NIO has some bugs/limitations. In that case, would it be safer if just NIOFSDir used NIO?
  2. Can copyTo(Directory, Collection<String>) be changed to copyTo(Directory, Iterable<String>)? Unless we think that someone would want to use size() or something.
  3. I know it's a matter of style, but you "import static Arrays.asList", and then use asList directly in copyTo(Dir). It confuses me because I expect asList to be a method declared on Dir, and so I prefer to see Arrays.asList. But it's just style, don't know how others feel about that.
  4. On copyTo(Dir), perhaps instead of converting the listAll() to List and then remove elements from it, you can just iterate on whatever listAll() returns and add the files that pass the filter to a list? You can even optimize and if all the files Dir returned pass the filter, you can just pass the array to copyTo(Dir, Iterable), assuming we change the method to accept Iterable. But that's a minor optimization.
  5. copy(src, dest, boolean) - can you add a message to @deprecated so users will know what to replace it with more easily?
  6. I see that copy(src, dest) also accepts a boolean of whether to close the src directory. But copyTo(dIr) doesn't. I personally think it's ok, as someone can call close on src himself, but am wondering if it wouldn't be more convenient. I.e. instead of change calls from Directory.copy(src, dest, true), I now need to do src.copyTo(dest) followed by a src.close().
  7. closeSafely - perhaps print the stacktrace, even if you don't throw it?
asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

1 -> I googled all around and nobody mentions any problems save for those with old 1.4 JDK. If problems arise, we can conditionalize code inside - making this optimization NIOFSDir-exclusive is just plain sad, what about MMapDir? 2 -> Don't have strong feelings there, but Uwe insists everything should be consistent and if I use Collection for sync(), then I should use it everywhere. Makes sense probably. 3 -> I feel that static imports are great. They take clutter away, but that's just my opinion. Can change this. 4 -> Applied this, without shady "optimizations". 5 -> ok 6 -> Did that on purporse. Directory.copy(src, dest, true) is way less readable than src.copyTo(dest); src.close(). There's no freaking way to tell what that true means without reading docs. 7 -> I really, really, really, really hate libraries that print something I didn't ask them for. Besides, current implementation prints nothing on similar occasion, so I'm following the trend.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

I personally haven't seen problem using NIO on Windows, but that's perhaps just because I haven't run into them yet :). I think your proposal makes sense - let's start w/ NIO bulk-copy and then we can disable if people complain or report errors.

Consistency is important, I agree. So let's keep Collection there. I just wanted to avoid converting arrays to a Collection, just so that they can be iterated on. Seems a waste to me, but not so much to argue about :).

Re (7), I hate such libraries too. But I hate more the ones that just hide problems away from me :). The ideal thing was if Lucene would use a logging mechanism (I once started it on LUCENE-1482) so that you could include the stacktrace print if logging is enabled. But currently the code just hides the problem away ... and I'd hate to debug such thing, not realizing an IO exception is thrown from close().

So unless #2556 springs back to life again, what do you suggest we do? Suppressing the exceptions seems wrong to me.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I just wanted to avoid converting arrays to a Collection, just so that they can be iterated on.

Sorry, for the dumb question: In which JDK do arrays implement Iterable? From my knowledge and a quick check with Java 5, it does not. Passing an array to a method taking Iterable does not work. Arrays only work in extended for statement, but not because they are Itearble. The generated code by javac is also totally different (and more effective than creating an iterator, it just uses the conventional for(i=0; i<length; i++) approach - try it out with javac and decompiling with jad or whatever)! Also arrays of native types can hardly implement Iterable without autoboxing.

See:

And where is the waste of calling Arrays.asList()? This is exactly the same overhead like creating an iterator() if arrays were Iterable, both are just "views" on the array, so no copy involved.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Sorry ... I was confused w/ the for loop of Java 5 :). Let's keep it Collection then. Sorry for the hassle.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I love CloseSafely! We do that in a number of places and should simply call it, instead. But can we change it to throw the first exception it encounters?

I also prefer Arrays.asList to be explicit.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

So unless #2556 springs back to life again, what do you suggest we do? Suppressing the exceptions seems wrong to me.

bq. But can we change it to throw the first exception it encounters? That's exactly what most of lucene is doing when closing something. If you can't log, you either suppress, or mask the previous exception. Let's mask it? That way the user may get the wrong exception, but he's not getting a situation when something failed but looks okay on the surface.

I love CloseSafely! We do that in a number of places and should simply call it, instead.

I did this for readers in my reopen patch, except new utility method does decRef.

I also prefer Arrays.asList to be explicit

ok :/

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

I don't want to block the issue. If #2556 will advance somewhere, we'll log a message in closeSafely. Otherwise between suppressing to always printing I agree we should suppress. If someone does not want to suppress he should call close(). Which makes me think we should call this method closeNoException because closeSafely is not exactly what it does :).

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Let's mask it? That way the user may get the wrong exception, but he's not getting a situation when something failed but looks okay on the surface.

By "mask it" you mean hold onto the first exception you hit, continue closing & ignoring any further exceptions, then throw that first exception, right?

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

Not right. Imagine exception is thrown when copying, then I try to close the channels. If that close throws another exception, I either has to suppress it, or to throw and thus hide initial exception.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Urgh... can we add a boolean arg (suppressExceptions) to control that? Because, if you did not hit an exception when copying, but then hit one when closing, we want to throw it in that case...

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Mike, that's what I wrote above "if someone does not want to suppress, he should call close". I think that closeSafely (or as I prefer it - closeNoException) should be closed only when you know you've hit an exception and you want to close the stream suppressing any exceptions. Otherwise call close().

can we add a boolean arg (suppressExceptions) to control that?

That would beat the purpose of the method no? I mean, currently it does not throw any exception, not even declaring one, and if we add that boolean it will need to declare "throws IOException", which will force the caller to try-catch that exception and ... suppress it or document "// cannot happen because I've passed false"?

So how about we call it closeNoException, document that it does not throw any exception and intentionally suppresses them, and if you don't want them to be suppressed, you can call io.close() yourself?

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

So how about we call it closeNoException, document that it does not throw any exception and intentionally suppresses them, and if you don't want them to be suppressed, you can call io.close() yourself?

But there is still a need to "close everything, but do throw the 1st exception you hit". We do this in a number of places in Lucene, ad-hoc today.

However, that need is different from what we're doing here, so I agree, let's postpone it and have this issue only create the "closeNoException" method.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

But there is still a need to "close everything, but do throw the 1st exception you hit".

Ohh I see what you mean. My assumption is that when you call closeNoException you already know that you've hit an exception and just want to close the stream w/o getting more exceptions. If you don't know that, don't call closeNoException?

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

My assumption is that when you call closeNoException you already know that you've hit an exception and just want to close the stream w/o getting more exceptions. If you don't know that, don't call closeNoException?

Right, for this issue, let's do that.

At some point in the future I'd like a "closeAllAndThrowFirstExceptionYouHit" :)

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok that's indeed different :). I guess we can introduce it now, in this issue (it's tiny and simple). A closeAll which documents it throws the first exception it hits.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

I'll get back to the issue in N hours and code something neat. : )

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

'ere we go! Read javadocs for closeSafely, it mimics the way we handle exceptions all over lucene, but is a single method call. Okay, you should still keep track of prior exception by hand, no going around it.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

Moved default Dir.copyTo to new close/exception handling method too.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Oooh! I like the priorException solution :)

I think this is ready to commit? I'll add a CHANGES entry, and I added missing copyright to IOUtils.java.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Do we want to suppress only IOExceptions? What about any RuntimeExceptions - upon hitting any of them the code will fly away? Not saying it's a bad thing, but pointing it out.

Other than that, the patch looks good. closeSafely is not exactly what I had in mind about closeNoException because it forces you to catch the IOE if you don't declare you throw it, or you need to move on, discarding it. But I guess this is a matter for another issue.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

closeSafely is not exactly what I had in mind about closeNoException because it forces you to catch the IOE if you don't declare you throw it

closeSafely wraps Closeable.close(), which declares IOException. Thus, you don't have to declare or discard anything new. As a matter of fact it encapsulates exact same code we're seeing copypasted (with varying degrees of success) all around Lucene.

As for RuntimeExceptions, I had ones in my Directory implementation, and it flew right through Lucene. When I asked our brainiacs if that is okay, they said that anything expected should be wrapped with IOException, as the operation is zero-cost (for no-exception-happened case), and anything unexpected should kill your app in a blaze of glory. I see the point in that, if I squint hard enough.

Mike, please mark that closeSagely with whatever.experimental? I think the signature may change, as it gets used in more places, like that type parameter may be an overkill.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'll mark it @lucene.internal.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Earwin!