Closed asfimport closed 11 years ago
Robert Muir (@rmuir) (migrated from JIRA)
patch fixing tests to not suppress whole codecs.
instead the testSortedSet() has an assume (and is ignored for ancient codecs).
in the case of offsets, ancient codecs just index and test docs/freqs/positions without offsets
Adrien Grand (@jpountz) (migrated from JIRA)
I use two parallel arrays to sort the documents (docs and values)
I updated the patch to use doc IDs as ords so that values are never swapped (only doc IDs) and the numeric doc values don't need to be all loaded in memory.
So one option is to remove the class, but still keep a test around which does the addIndexes to make sure it works.
+1
I don't want however to add a main that is limited to NumericDV ... and I do think that stored fields / payload value are viable options.
I still don't get why someone would use stored fields rather than doc values (either binary, sorted or numeric) to sort his index. I think it's important to make users understand that stored fields are only useful to display results?
Shai Erera (@shaie) (migrated from JIRA)
Thanks Rob - I didn't know we can check these things :). Certainly better than suppressing the entire Codec.
Adrien, thanks for the update as well. So if someone loads NumericDV (default), indeed there's no need to copy the values again into an array. If someone uses DiskDVFormat though, list.get(i) will access the disk on every call ... but I guess that's fine since if someone wanted to save RAM, he should be ready to pay the price, and we should respect him.
I still don't get why someone would use stored fields rather than doc values (either binary, sorted or numeric) to sort his index. I think it's important to make users understand that stored fields are only useful to display results?
Someone might have an existing index without DV. Also, who said that a stored field used for display cannot be used to sort the index? But, since it's quite trivial to implement, I'll remove both Payload and StoredFields. I'll also make Reverse and Numeric sorters inner classes (though public) of Sorter.
I added a check in SortingAtomicReader ctor that old2new.length == reader.maxDoc(), to ensure that sorters provide a mapping for every document in the index. I'll get rid of IndexSorter, but keep a test around + add to SortingAR javadocs code example how to use it for addIndexes.
Will upload a new patch later.
Andrzej Bialecki (@sigram) (migrated from JIRA)
I still don't get why someone would use stored fields rather than doc values (either binary, sorted or numeric) to sort his index. I think it's important to make users understand that stored fields are only useful to display results?
This is a legacy of the original usage of this tool in Nutch - indexes would use a PageRank value as a document boost, and that was the value to be used for sorting - but since the doc boost is not recoverable from an existing index the value itself was stored in a stored field.
And definitely DV didn't exist yet at that time :)
Shai Erera (@shaie) (migrated from JIRA)
Patch removes IndexSorter (but keeps IndexSortingTest). I also:
Moved ReverseDocIDSorter to a singleton on Sorter, and made IndexSortingTest randomly pick it or NumericDVSorter.
Removed Payload and StoredFields sorter. As a consequence, removed SorterTest (sorters are covered by IndexSortingTest).
Added example code to SortingAtomicReader jdocs.
Shai Erera (@shaie) (migrated from JIRA)
I think it's ready. If there are no objections, I'd like to commit it later today.
Shai Erera (@shaie) (migrated from JIRA)
Patch optimizes not encoding offsets in memory if offsets are not indexed. This saves 10 bytes per position for most cases (since offsets are not indexed by default, even for positions-enabled fields, e.g. TextField).
Shai Erera (@shaie) (migrated from JIRA)
Optimize not encoding freqs in memory if freqs were not indexed (even if they are requested in flags).
Commit Tag Bot (migrated from JIRA)
[trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1454801
LUCENE-3918: port IndexSorter to trunk API
Shai Erera (@shaie) (migrated from JIRA)
Committed to trunk and 4x. Thanks Anat, your work has re-ignited this issue!
Commit Tag Bot (migrated from JIRA)
[branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1454804
LUCENE-3918: port IndexSorter to trunk API
Uwe Schindler (@uschindler) (migrated from JIRA)
Closed after release.
3556 added an IndexSorter to 3.x, but we need to port this
functionality to 4.0 apis.
Migrated from LUCENE-3918 by Robert Muir (@rmuir), 2 votes, resolved Mar 10 2013 Attachments: LUCENE-3918.patch (versions: 16) Linked issues:
5899
5817
5904
5912
5895
5936