apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

BaseDocIdSetTestCase [LUCENE-5100] #6164

Closed asfimport closed 11 years ago

asfimport commented 11 years ago

As Robert said on #6145, we would benefit from having common testing infrastructure for our DocIdSet implementations.


Migrated from LUCENE-5100 by Adrien Grand (@jpountz), resolved Jul 12 2013 Attachments: LUCENE-5100.patch (versions: 2)

asfimport commented 11 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Patch. This is mostly a factorization of the tests of EliasFanoDocIdSet and WAH8DocIdSet. It is hard to factor the tests of FixedBitSet and OpenBitSet since they don't share a common interface for prevSetBit and nextSetBit for instance. Nothing that we can't improve but I am wondering why we need OpenBitSet? Are there any cases where we need the bit set to grow on demand? I have checked a few places where it was used (it is used all over the code base) and every time it seemed to me that it could have been advantageously replaced with a FixedBitSet?

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I know of a few places where I used this in lucene-core where FixedBitSet cannot currently be used: Anything having to do with SortedSetDocValues, where ordinals can exceed > 2B, because OpenBitSet can do that. FixedBitSet is limited to int.

However use for example, in SloppyPhraseScorer seems unnecessary.

asfimport commented 11 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Thanks for the explanation, Robert. I tried to factorize some code between TestFixedBitSet and TestOpenBitSet by adding an abstraction level on top of both FixedBitSet and OpenBitSet but its complexity made the tests even harder to read, so I think I won't touch the prevSetBit/nextSetBit/flip/... tests and just add the tests from BaseDcIdSetTestCase.

Updated patch. The modification in EliasFanoEncoder is here to always be able to pass maxDoc - 1 as an upper bound even when the set is empty (an assertion would trip otherwise). I think it is ready?

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Patch looks good. I would also keep the non-DocIdSet testing separate: prevSetBit/nextSetBit/flip/... are not part of the main DocIdSet API, so they should not be in the base test case. But you can still use methods from the base class to check what happend after you did a flip(), so implementing a test for flip() should be easy.

asfimport commented 11 years ago

Paul Elschot (migrated from JIRA)

The patch provides independent tests for EliasFanoDocIdSet and does not change the existing TestEliasFanoSequence for the Elias-Fano special cases, great.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1502448 from @jpountz https://svn.apache.org/r1502448

LUCENE-5100: BaseDocIdSetTestCase.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1502450 from @jpountz https://svn.apache.org/r1502450

LUCENE-5100: BaseDocIdSetTestCase (merged from r1502448).

asfimport commented 11 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

4.5 release -> bulk close