Open rampeni opened 6 years ago
Extra strangeness/hints: when we add the fields we search on to the query, we get different results. vat_search and fuzzy_company_search: 50 exact_company_search: 0 company_search_prio1: 9, ...prio2: 41, ...prio3: 42...
So the combination field / query type probably has an influence. I realise it is hard to comment, but we've gone a long way trying to provide a reproducible test case and have failed...
We are facing a strange problem while testing the upgrade of our applications from Cassandra+lucene plugin 2.2.4 to the latest 3.11.2 + plugin 3.11.1.0.
We get a difference when querying a certain lucene index: when selecting specific fields we get no result (0 rows found), when performing the exact same query but using select *, we get the expected rows.
Extra difficulty: the problem is hard to reproduce. Once it occurs for a certain table it is reproducable. However dropping, recreating and refilling the same table makes the issue disappear.
We tried to recreate the issue on a sample table, but failed probably due to the non-deterministic nature of the occurrence. I've attached a sample statement (admitted, query can be improved but we want to validate the upgrade and touch the apps later). Replacing the field list by a * makes it work (all via cqlsh)
We added some debug statements, the issue seems to be caused by "something" between searching and topping of the results in IndexPostProcessor.scala. After the collect, in both cases 50 rows are returned. After the top, in the case with fields we have no rows left.
Does this ring a bell with anyone? We've debugged further into the Lucene core but that's costing a lot of time. So far, our investigation has reached the point where we think the difference is caused by a difference in the org.apache.lucene.searchTermQuery's getTermsEnum-method return value. In the working case, it returns a Term, which is then used to define a scorer and obtain results. In the faulty case, it returns null, no scorer is assigned and the story ends. The org.apache.lucene.index.TermContext's internal TermState[] has been initialised, but nothing has been registered in this case so it only contains null.
Additional hint: the above actually causes us to end up in the org.apache.lucene.search.TopDocsCollector, topDocs method, where the authors add an if with the comment "Don't bother to throw an exception, just return an empty TopDocs in case the parameters are invalid or out of range. TODO: shouldn't we throw IAE if apps give bad params here so they dont have sneaky silent bugs?"...would have been nice, because in the fauly case the if is true and and empty collection is returned...for sure sneaky and silent :)
Any ideas are helpful...either to solve the issue, or to ensure that the "workaround" of using select * is always returning the correct result. broken.txt