Closed asfimport closed 8 years ago
Michael McCandless (@mikemccand) (migrated from JIRA)
Wow, nice catch! Thank you.
Our default codec postings & doc values format doesn't override the merge method, so I guess the impact here on a default Lucene usage is minor. But if you customize your codec then it could matter if you have a better merge impl than the naive default.
I'll review the patch.
Michael McCandless (@mikemccand) (migrated from JIRA)
It's a little spooky how sneaky this patch needs to be, temporarily overwriting MergeState members, adding a FilterFieldsProducer, FilterFieldsInfos.
Can we improve those latter two classes to e.g. reject a field that
was not in the restricted set, if you call
FilterFieldInfos.fieldInfo
or FilterFieldsProducer.terms
on an
invalid field name?
The FilterFieldsInfos is also in a precarious state, having to include all incoming FieldInfo instances so the numbers are consistent, yet only overriding the iterator.
Methods like FilterFieldInfos.hasProx
, etc., are also wrong, which can
result in sneaky future bugs for codecs that rely on this.
I don't really like the complexity in this patch: I think this is a little too much sneakiness. Yet, I don't know of a cleaner way to fix the bug.
Stepping back a bit, can you describe the use case motivating allowing your custom codec to override the default merging for doc values / postings?
Julien Massenet (migrated from JIRA)
I agree that being this sneaky is not ideal, but the only alternative I see would be change the Postings API.
In this patch, I tried to keep modifications constrained to the org.apache.lucene.codecs.perfield
package, but I can give a shot at trying to a come up with a cleaner implementation that updates the other APIs if you want.
I've uploaded an updated version of the patch which does not remove the sneakiness but makes the PerFieldMergeState
more robust:
FilterFieldInfos
class now correctly computes and exposes the hasXXX
properties.FilterFieldInfos.fieldInfo()
and FilterFieldsProducer.terms()
now throw IllegalArgumentException
if called with fields outside if the current merge contextLegacy*
field types are not accessible in this module anymore.As for why we're overriding merge() calls in our codec:
Michael McCandless (@mikemccand) (migrated from JIRA)
Thanks Julien Massenet, I think this patch is a good step forwards, and we can try to simplify the approach in future issues (progress not perfection!).
I'll fixup the minor failures from ant precommit
and push.
ASF subversion and git services (migrated from JIRA)
Commit a6a8032c7f079ea59daea0c95e48f69b2986d918 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a6a8032
LUCENE-7456: PerFieldPostings/DocValuesFormat was failing to delegate the merge method
ASF subversion and git services (migrated from JIRA)
Commit 796ed508f39683c626d4870a7ab583a222b2c64c in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=796ed50
LUCENE-7456: PerFieldPostings/DocValuesFormat was failing to delegate the merge method
Michael McCandless (@mikemccand) (migrated from JIRA)
Thank you Julien Massenet!
While porting some old codec code from Lucene 4.3.1, I couldn't get the per-field formats to call upon the per-field merge methods; the default merge method was always being called.
I think this is a side-effect of #6956.
Attached is a patch with a test that reproduces the error and an associated fix that pass the unit tests.
Migrated from LUCENE-7456 by Julien Massenet, resolved Oct 04 2016 Attachments: LUCENE-7456.patch, LUCENE-7456-v2.patch Linked issues:
9098