java.lang.IllegalStateException: Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc. docId=2147483647, class org.apache.lucene.search.ConjunctionScorer [LUCENE-7674]

asfimport commented 7 years ago

Started seeing this error message on a production Solr 6.3.0 system today making use of parent/child documents:

java.lang.IllegalStateException: Child query must not match same docs with parent filter. Combine them as must clauses + to find a problem doc. docId=2147483647, class org.apache.lucene.search.ConjunctionScorer
    at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.checkOrthogonal(ToParentBlockJoinQuery.java:403)
    at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.access$400(ToParentBlockJoinQuery.java:206)
    at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer$1.nextDoc(ToParentBlockJoinQuery.java:327)
    at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
    at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:172)
    at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)
    at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:106)
    at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:95)
    at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1379)
    at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:1057)
    at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1227)
    at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1842)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1616)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:617)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:531)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:518)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
    at java.lang.Thread.run(Thread.java:745)

The "docId=2147483647" part seems suspicious since that corresponds to Integer.MAX_VALUE and my index only has 102,013,289 docs in it. According to the Solr searcher stats page I have:

numDocs: 71,870,998 maxDocs: 102,013,289 deletedDocs: 30,142,291

I took the query that was failing and attempted to intersect my parent query with the child query to find any problem docs but that came back with 0 results.

After performing an optimize (via the Solr UI) on the index the problem has gone away and the query that previously triggered this error works as it should.

Migrated from LUCENE-7674 by Tim Underwood (@tpunder), updated Feb 16 2017 Attachments: LUCENE-7674.patch, LUCENE-7674-attempt-to-reproduce.patch Linked issues:

SOLR-10144

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

This particular error means that there is a problem in the way your index is structured since you had at least one segment that did not have a parent doc as a last document. This is wrong because block joins work on blocks of documents that contain 0-n children followed by one parent so the last document is necessarily a parent document.

asfimport commented 7 years ago

Tim Underwood (@tpunder) (migrated from JIRA)

Thanks @jpountz! I'm trying to figure out if this an issue on my side (very possible) or if it's a Solr or Lucene issue.

All my indexing goes through Solr (via SolrJ) and as far as I can tell I'm not attempting to index any child documents without a corresponding parent document. I'm not even sure if Solr or SolrJ would allow me to do that.

Does it make sense that optimizing the index would cause the problem to go away?

I think I was able to snag a copy of the index that was causing problems before the optimized version was able to replicate. Any suggestions/pointers for trying to track down whatever docs are problematic? Will running CheckIndex on it tell me anything useful?

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

@tpunder, it usually happens when uniqueKey is duplicated, it causes deleting former parent doc. It can be verified with org.apache.lucene.search.join.CheckJoinIndex, although it doesn't have main() method.

@jpountz, what if will invoke CheckJoinIndex logic lazily somewhere in org.apache.lucene.search.join.QueryBitSetProducer.getBitSet(LeafReaderContext)? It won't cost much as well it should be lazy, but provides more predictable behaviour for users.

asfimport commented 7 years ago

Tim Underwood (@tpunder) (migrated from JIRA)

Thanks @mkhludnev! Running CheckJoinIndex on my bad index (assuming I got my parentsFilter right) says:

java.lang.IllegalStateException: Parent doc 3324040 of segment _vfo(6.3.0):C28035360/10475131:delGen=86 is deleted but has a live child document 3323449

Running CheckJoinIndex on the optimized version of the index doesn't complain.

So... that leaves me wondering where the bug is. I am frequently (via Solr) re-indexing parent/child documents that duplicate existing documents based on my unique key field but my understanding is that Solr should automatically delete the old parent and child documents for me. Maybe thats a bad assumption.

It looks like maybe I'm running into one or more of these issues: SOLR-5211, SOLR-5772, SOLR-6096, SOLR-6596, SOLR-6700

Sounds like I should probably just make sure I explicitly delete any old parent/child documents that I'm replacing to be on the safe side.

asfimport commented 7 years ago

Tim Underwood (@tpunder) (migrated from JIRA)

I also noticed that I have some deleteByQuery calls that target parents documents but not their children (my assumption being that Solr or Lucene would also delete the corresponding child documents). Perhaps that is what is causing the orphan child documents. I'll be sure to explicitly delete those also.

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

LUCENE-7674.patch introduces CheckingQueryBitSetProducer which checks parent segment's bitset before caching and switches \{\!parent} \{\!child} to use it. It laid well, beside of, and it's interesting! BJQParserTest.testGrandChildren(). When we have three levels: parent, child, grand-child and searching for children (2nd level), it requires to include all ascendant levels (parent) in bitset. This, will break existing queries for those who run more than two level blocks. But such explicitly strict behavior solves problems for those who tires to retrieve intermediate levels by [child] then, I remember a couple of such threads in the list. What do you think?

LUCENE-7674.patch

```diff diff --git a/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java b/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java index 025aeef..5aa2d4c 100644 --- a/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java +++ b/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java @@ -40,32 +40,37 @@ continue; } final BitSet parents = parentsFilter.getBitSet(context); - if (parents == null || parents.cardinality() == 0) { - throw new IllegalStateException("Every segment should have at least one parent, but " + context.reader() + " does not have any"); - } - if (parents.get(context.reader().maxDoc() - 1) == false) { - throw new IllegalStateException("The last document of a segment must always be a parent, but " + context.reader() + " has a child as a last doc"); - } - final Bits liveDocs = context.reader().getLiveDocs(); - if (liveDocs != null) { - int prevParentDoc = -1; - DocIdSetIterator it = new BitSetIterator(parents, 0L); - for (int parentDoc = it.nextDoc(); parentDoc != DocIdSetIterator.NO_MORE_DOCS; parentDoc = it.nextDoc()) { - final boolean parentIsLive = liveDocs.get(parentDoc); - for (int child = prevParentDoc + 1; child != parentDoc; child++) { - final boolean childIsLive = liveDocs.get(child); - if (parentIsLive != childIsLive) { - if (childIsLive) { - throw new IllegalStateException("Parent doc " + parentDoc + " of segment " + context.reader() + " is live but has a deleted child document " + child); - } else { - throw new IllegalStateException("Parent doc " + parentDoc + " of segment " + context.reader() + " is deleted but has a live child document " + child); - } + checkParentsInSegment(context, parents); + } + } + + protected static Bits checkParentsInSegment(LeafReaderContext context, final BitSet parents) throws IOException { + if (parents == null || parents.cardinality() == 0) { + throw new IllegalStateException("Every segment should have at least one parent, but " + context.reader() + " does not have any"); + } + if (parents.get(context.reader().maxDoc() - 1) == false) { + throw new IllegalStateException("The last document of a segment must always be a parent, but " + context.reader() + " has a child as a last doc"); + } + final Bits liveDocs = context.reader().getLiveDocs(); + if (liveDocs != null) { + int prevParentDoc = -1; + DocIdSetIterator it = new BitSetIterator(parents, 0L); + for (int parentDoc = it.nextDoc(); parentDoc != DocIdSetIterator.NO_MORE_DOCS; parentDoc = it.nextDoc()) { + final boolean parentIsLive = liveDocs.get(parentDoc); + for (int child = prevParentDoc + 1; child != parentDoc; child++) { + final boolean childIsLive = liveDocs.get(child); + if (parentIsLive != childIsLive) { + if (childIsLive) { + throw new IllegalStateException("Parent doc " + parentDoc + " of segment " + context.reader() + " is live but has a deleted child document " + child); + } else { + throw new IllegalStateException("Parent doc " + parentDoc + " of segment " + context.reader() + " is deleted but has a live child document " + child); } } - prevParentDoc = parentDoc; } + prevParentDoc = parentDoc; } } + return liveDocs; } } diff --git a/lucene/join/src/java/org/apache/lucene/search/join/CheckingQueryBitSetProducer.java b/lucene/join/src/java/org/apache/lucene/search/join/CheckingQueryBitSetProducer.java new file mode 100644 index 0000000..78989e5 --- /dev/null +++ b/lucene/join/src/java/org/apache/lucene/search/join/CheckingQueryBitSetProducer.java @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search.join; + +import java.io.IOException; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.search.DocIdSet; +import org.apache.lucene.search.Query; +import org.apache.lucene.util.BitDocIdSet; +import org.apache.lucene.util.BitSet; + +public class CheckingQueryBitSetProducer extends QueryBitSetProducer { + + public CheckingQueryBitSetProducer(Query query) { + super(query); + } + + @Override + protected DocIdSet createBitDocIdSet(LeafReaderContext context) throws IOException { + + DocIdSet docIdSet = super.createBitDocIdSet(context); + + if (context.reader().maxDoc() > 0) { + BitSet bits = ((BitDocIdSet) docIdSet).bits(); + try{ + CheckJoinIndex.checkParentsInSegment(context,bits); + }catch(Exception e) { + throw new IllegalStateException("validating "+query+ " yields: "+e.getMessage(), e); + } + } + return docIdSet; + } +} diff --git a/lucene/join/src/java/org/apache/lucene/search/join/QueryBitSetProducer.java b/lucene/join/src/java/org/apache/lucene/search/join/QueryBitSetProducer.java index 98d85cd..fa63e95 100644 --- a/lucene/join/src/java/org/apache/lucene/search/join/QueryBitSetProducer.java +++ b/lucene/join/src/java/org/apache/lucene/search/join/QueryBitSetProducer.java @@ -38,7 +38,7 @@ * {@link BitSet}s per segment. */ public class QueryBitSetProducer implements BitSetProducer { - private final Query query; + protected final Query query; private final Map cache = Collections.synchronizedMap(new WeakHashMap<>()); /** Wraps another query's result and caches it into bitsets. @@ -63,21 +63,32 @@ DocIdSet docIdSet = cache.get(key); if (docIdSet == null) { - final IndexReaderContext topLevelContext = ReaderUtil.getTopLevelContext(context); - final IndexSearcher searcher = new IndexSearcher(topLevelContext); - searcher.setQueryCache(null); - final Weight weight = searcher.createNormalizedWeight(query, false); - final Scorer s = weight.scorer(context); - - if (s == null) { - docIdSet = DocIdSet.EMPTY; - } else { - docIdSet = new BitDocIdSet(BitSet.of(s.iterator(), context.reader().maxDoc())); - } - cache.put(key, docIdSet); + final DocIdSet value; + value = createBitDocIdSet(context); + cache.put(key, docIdSet=value); } return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); } + + /** + * @return either {@link BitDocIdSet} for the given segment or {@link DocIdSet#EMPTY} + * in case of absent matches on the given segment + * */ + protected DocIdSet createBitDocIdSet(LeafReaderContext context) throws IOException { + final DocIdSet value; + final IndexReaderContext topLevelContext = ReaderUtil.getTopLevelContext(context); + final IndexSearcher searcher = new IndexSearcher(topLevelContext); + searcher.setQueryCache(null); + final Weight weight = searcher.createNormalizedWeight(query, false); + final Scorer s = weight.scorer(context); + + if (s == null) { + value = DocIdSet.EMPTY; + } else { + value = new BitDocIdSet(BitSet.of(s.iterator(), context.reader().maxDoc())); + } + return value; + } @Override public String toString() { diff --git a/lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java b/lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java index aa68d09..36b56a2 100644 --- a/lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java +++ b/lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java @@ -68,7 +68,8 @@ indexReader = DirectoryReader.open(indexWriter); indexWriter.close(); indexSearcher = new IndexSearcher(indexReader); - parentsFilter = new QueryBitSetProducer(new WildcardQuery(new Term("parent", "*"))); + parentsFilter = random().nextBoolean() ? new QueryBitSetProducer(new WildcardQuery(new Term("parent", "*"))) + : new CheckingQueryBitSetProducer(new WildcardQuery(new Term("parent", "*"))); } @Override diff --git a/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java b/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java index 8f36dd2..f54fc09 100644 --- a/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java +++ b/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java @@ -23,7 +23,7 @@ import org.apache.lucene.search.DocIdSet; import org.apache.lucene.search.Query; import org.apache.lucene.search.join.BitSetProducer; -import org.apache.lucene.search.join.QueryBitSetProducer; +import org.apache.lucene.search.join.CheckingQueryBitSetProducer; import org.apache.lucene.search.join.ScoreMode; import org.apache.lucene.search.join.ToParentBlockJoinQuery; import org.apache.lucene.util.BitDocIdSet; @@ -94,7 +94,7 @@ } private BitSetProducer createParentFilter(Query parentQ) { - return new QueryBitSetProducer(parentQ); + return new CheckingQueryBitSetProducer(parentQ); } static final class AllParentsAware extends ToParentBlockJoinQuery { diff --git a/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java b/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java index 39fa791..5e7374b 100644 --- a/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java +++ b/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java @@ -246,7 +246,7 @@ req("q", "{!parent which=$parentfilter v=$children}", "children", "{!parent which=$childrenfilter v=$grandchildren}", "grandchildren", "grand_s:" + "x", "parentfilter", - "parent_s:[* TO *]", "childrenfilter", "child_s:[* TO *]"), + "parent_s:[* TO *]", "childrenfilter", "child_s:[* TO *] {!v=$parentfilter}"), sixParents); // int loops = atLeast(1); String grandChildren = xyz.get(random().nextInt(xyz.size())); @@ -254,7 +254,7 @@ req("q", "+parent_s:(a e b) +_query_:\"{!parent which=$pq v=$chq}\"", "chq", "{!parent which=$childfilter v=$grandchq}", "grandchq", "+grand_s:" + grandChildren + " +grand_parentchild_s:(b* e* c*)", - "pq", "parent_s:[* TO *]", "childfilter", "child_s:[* TO *]"), + "pq", "parent_s:[* TO *]", "childfilter", "child_s:[* TO *] {!v=$pq}"), beParents); } ```

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

@tpunder, you've got everything right! Thanks for gathering those pet peeves in the list. Here is one more, SOLR-7606 - it's my favorite ones. I need to tackle them sooner or later.

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

@jpountz , @uschindler, what's your opinion about CheckingQueryBitsetProducer and restricting multilevel blocks?

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

It feels wrong to me that we enforce these rules at search time, while they should be enforced at index time. I think the true fix to all these block join issues would be to make Solr know queries that describe the parent and child spaces rather than expect users to provide them at search time. Then once it knows that, it could reject update/delete operations that would break the block structure, fail queries that use a parent query that is not one of the expected ones, maybe add a FILTER clause to the child query to restrict it to the child space in case some fields are used at multiple levels, etc.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I agree with Adrien. The current block join support in Solr is a desaster, because it was released to early. Just nuke the broken APIs and create a new one, so Solr internally knows from schema/mapping how to block join and also prevent misformed updates. This is also worth a backwards compatibility break! Doing expensive runtime checks on every query just to keep a broken API/implementation is not a good idea. Break hard and come with a better API, the users will still be more happy, trust me. I know so many users who f*ck up the block joins, as Solr does not enforce it correctly. Do the following:

remove Solr ID fields from child documents (why do we have them? This also makes updates to child documents impossible)
always hide child documents on "normal" queries and return them only with the parent document (like Elasticsearch does)
automatically add block join queries if fields of the child documents are part of the query
add some extra queries to specifically search on childs and return childs only (hiding parents, of course)
if somebody updates a parent document, delete also all childs and create a new block
hide the block join filter. Solr should have an internal marker field to support block join, which is never exposed

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

Oh.. I've got your point, guys. Thanks. I'd probably raise gsoc ticket and try to scratch backlog.

asfimport commented 7 years ago

David Smiley (@dsmiley) (migrated from JIRA)

+1 to Adrien Uwe's remarks. It was released too early.

asfimport commented 7 years ago

Mikhail Khludnev (@mkhludnev) (migrated from JIRA)

Ok. I started to scratch the spec at SOLR-10144. Everybody are welcome. Meanwhile, I tried to reproduce this exact failure to come up with more informative message. But it seems like it's impossible - recently redesigned BlockJoinQuery ignores children behind the last parent in segment.

apache / lucene

java.lang.IllegalStateException: Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc. docId=2147483647, class org.apache.lucene.search.ConjunctionScorer [LUCENE-7674] #8725