Open asfimport opened 3 years ago
Mike Drob (@madrob) (migrated from JIRA)
@uschindler - is there some historical subtlety that I am not aware of that should prevent us from doing this?
Uwe Schindler (@uschindler) (migrated from JIRA)
Not sure why you ask me, but the Timsort implementation is not only used for standard sorting of lists and arrays. Lucene has its own sorting infrastructure to allow sorting multiple arrays and lists in parallel (the first array is sorted and the parallel ones are getting same element swaps like the first one). With plain Java apis that's not possible.
In addition: Arrays/Collections.sort() makes clones of the objects instead of sorting in place.
Adrien Grand (@jpountz) (migrated from JIRA)
+1 to drop the ArrayUtil/CollectionUtil tim sorts, but like Uwe said we still need TimSorter.
In addition: Arrays/Collections.sort() makes clones of the objects instead of sorting in place.
I know it used to be true for Collections.sort() but I don't think this is the case anymore, at least not with ArrayList.
Uwe Schindler (@uschindler) (migrated from JIRA)
Actually the ArrayUtils and CollectionUtils timSort methods are not special, they just use the Sorter interface implementation TimSorter to sort the array. So removing them is not removing any real code, just some getters/setters for array elements :-) If we remove them, we should also remove all other sorting methods like introSort or others!? We won't do this because every sort algorithm is used for different purposes in Lucene!
So I tend to keep those utility methods, as we exactly know what sort algorithm is used and the code is tested. -1 to remove.
Uwe Schindler (@uschindler) (migrated from JIRA)
Actually the availability of these methods are good to test the TimSorter implementation of the Sorter interface in Lucene for correctness and compare them to JDK.
Michael Sokolov (@msokolov) (migrated from JIRA)
I just found a new use for IntroSorter - please don't remove! As for the util methods, they're not doing any harm? I'd be -0 to migrate JDK impl for those cases where we can; I mean sure, why not, but on the other hand if it's not broke, why fix it? The test coverage is a benefit, although we could always find another way, but like Uwe points out, we wouldn't really be saving any maintenance costs, since we still need to maintain the core algorithm for the parallel sorting case.
Michael McCandless (@mikemccand) (migrated from JIRA)
We should at least run luceneutil
to check the performance impact here before switching? Sorting is such a heavily used and finely tuned part of Lucene that it could have a needle moving impact, even if the core big-oh is the same since it is the same fundamental algorithm.
David Smiley (@dsmiley) (migrated from JIRA)
Last I saw, we have these sort implementations but not enough javadocs on the circumstances in which they are best used (or not used), and furthermore why we even have them at all (relative to JDK impls).
JDK 11 has a TimSort with very similar lineage to our TimSort (written by Josh Bloch, based on Tim Peter's python list sort), we should consider replacing our fork with a version that we don't have to maintain.
Migrated from LUCENE-9632 by Mike Drob (@madrob), updated Jan 23 2021