Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 171 forks source link

Altering UDTs cause ArrayIndexOutOfBoundsException in ColumnsMapper #395

Open jgerew opened 6 years ago

jgerew commented 6 years ago

Apache Cassandra Version: 3.11.2 Stratio Cassandra Lucene Index Version: 3.11.1.0

Reproduction steps:

  1. Create a UDT CREATE TYPE test_udt (name text, type text);

  2. Create a stratio lucene index using the UDT

    CREATE CUSTOM INDEX test_index ON test_table ()
    USING 'com.stratio.cassandra.lucene.Index'
    WITH OPTIONS = {
    'refresh_seconds': '1',
    'schema': '{
      fields: {
         "raw_data.test.name": {type: "text"}
      }
    }'
    };
  3. Insert data into the test_table

  4. Execute a query using stratio lucene expression to verify results SELECT * FROM test_table WHERE expr(test_index, '{query:[{type:"boolean","should":[{type:"wildcard",field:"raw_data.test.name",value:"*"}]}]}');

  5. Add a field to the UDT ALTER TYPE test_udt ADD test_code int;

  6. Execute a query again

ERROR   [Native-Transport-Requests-1]   2018-07-09  18:37:55,141    QueryMessage.java:129   -   Unexpected  error   during  query
    java.lang.ArrayIndexOutOfBoundsException:   2
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8(ColumnsMapper.scala:215) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8$adapted(ColumnsMapper.scala:214) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterable.foldRight(Iterable.scala:54)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:214)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:173)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8(ColumnsMapper.scala:222) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8$adapted(ColumnsMapper.scala:214) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterable.foldRight(Iterable.scala:54)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:214)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:173)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:156)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:119) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper.$anonfun$columns$3(ColumnsMapper.scala:91)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractIterable.foldRight(Iterable.scala:54)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:87)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:56)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexPostProcessor.document(IndexPostProcessor.scala:141)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexPostProcessor.$anonfun$top$1(IndexPostProcessor.scala:106)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexPostProcessor.top(IndexPostProcessor.scala:103)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexPostProcessor.process(IndexPostProcessor.scala:57)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.ReadCommandPostProcessor.apply(IndexPostProcessor.scala:168)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.ReadCommandPostProcessor.apply(IndexPostProcessor.scala:161)   ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  org.apache.cassandra.db.PartitionRangeReadCommand.postReconciliationProcessing(PartitionRangeReadCommand.java:408)  ~[apache-cassandra-3.11.2.jar:3.11.2]
        at  org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:2288) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at  org.apache.cassandra.db.PartitionRangeReadCommand.execute(PartitionRangeReadCommand.java:263)   ~[apache-cassandra-3.11.2.jar:3.11.2]
        at  com.stratio.cassandra.lucene.IndexQueryHandler.executeSortedLuceneQuery(IndexQueryHandler.scala:226)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexQueryHandler.executeLuceneQuery(IndexQueryHandler.scala:193)  ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexQueryHandler.processStatement(IndexQueryHandler.scala:122)    ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  com.stratio.cassandra.lucene.IndexQueryHandler.process(IndexQueryHandler.scala:101) ~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
        at  org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at  org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)    [apache-cassandra-3.11.2.jar:3.11.2]
        at  org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)    [apache-cassandra-3.11.2.jar:3.11.2]
        at  io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)  [netty-all-4.0.44.Final.jar:4.0.44.Final]
        at  io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)    [netty-all-4.0.44.Final.jar:4.0.44.Final]
        at  io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)    [netty-all-4.0.44.Final.jar:4.0.44.Final]
        at  io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)    [netty-all-4.0.44.Final.jar:4.0.44.Final]
        at  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_171]
        at  org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)    [apache-cassandra-3.11.2.jar:3.11.2]
        at  org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)   [apache-cassandra-3.11.2.jar:3.11.2]
        at  java.lang.Thread.run(Thread.java:748)   [na:1.8.0_171]
jgerew commented 6 years ago

The problem appears to lie with how the values are separated by UDT fields. All of the data saved prior to the ALTER statement has only 2 values (index 0 and 1), but the UDT now has 3 fields (index 0, 1, and 2). The UDT field names are iterated over and expects the same number of indexes in the value as there are in the field names causing the exception. See current code below:

  private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
    val itemValues = udt.split(value)
    ((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
      val itemValue = itemValues(i) #causes ArrayIndexOutOfBoundsException
      if (itemValue == null) {
        columns
      } else {
        val itemName = udt.fieldNameAsString(i)
        val itemType = udt.fieldType(i)
        val itemColumn = column.withUDTName(itemName)
        this.columns(itemColumn, itemType, itemValue) ++ columns
      }
    })
  }

If we change the code to expect the possibility of an index mismatch we can resolve the issue:

  private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
    val itemValues = udt.split(value)
    ((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
      val itemValue = if (i < itemValues.length) itemValues(i) else null #see here
      if (itemValue == null) {
        columns
      } else {
        val itemName = udt.fieldNameAsString(i)
        val itemType = udt.fieldType(i)
        val itemColumn = column.withUDTName(itemName)
        this.columns(itemColumn, itemType, itemValue) ++ columns
      }
    })
  }
smiklosovic commented 6 years ago

Hi @jgerew

We are hitting the very same issue.

Our "workflow" is like this:

We have completely empty DB and we create schema, we insert data and then we create index so all is indexed. All works. After that we drop the index and recreate the very same index again and all queries are giving us this exception.

I was going through the very same code as you did and yes it failed on that row.

It is worth to say what if we dont use "expr" queries but "where lucene = query" it all works.

Why?

Could you look into this please?

We are using 3.7.2 Cassandra with 3.7.6 plugin.

@adelapena

smiklosovic commented 6 years ago

I am taking back my point about expr vs lucene, it fails either way.

smiklosovic commented 6 years ago

what is even more strange is that we can not use sort after we drop and create an index but we can continue to use queries without sorting, all lucene and expr are working without sorting even we drop and create index again.

jgerew commented 6 years ago

Hi @smiklosovic,

I wish I could help, but it doesn't sound like the same thing we were facing. Our problem was due to modifying the UDT type. I don't think it being an index had anything to do with it, we just ran into issues with the stratio plugin when it was formulating the result set. In our case, stratio was expecting the same number of values as there were fields in the UDT and since we had added a new field we got the ArrayIndexOutOfBoundsException.

That being said, we did run into some issues with dropping/re-adding indexes and getting ArrayIndexOutOfBoundsExceptions. We didn't dig too far into it, but we figured it may have been due to schema replication. You may want to try to run your index alterations with consistency set to ALL.

Good luck!

Joe

smiklosovic commented 6 years ago

@jgerew

we were also thinking this is due to altering an udt - we have our migration scripts and we indeed altered UDT as the last script by adding a field.

But once we were about to replicate this "from scratch" we "describe keyspace"-ed the DB where all was consolidated in a flat schema so no altering at all but we are facing this issue with index drop and sorting anyway.

We wanted to workaround it to have timestamp as part of primary key as clustering column so we would "order by"-ed but

InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."

jgerew commented 6 years ago

Interesting, do you have data stored for all of the UDT fields? If not, I wonder if that could be causing you to have the same issue I did. For us, the index had nothing to do with the problem. We got the ArrayIndexOutOfBoundsException whenever we queried the DB (using 'expr') where the result set would have returned records containing the UDT data with missing pieces.

Try inserting a new (and fully populated) record into your table and querying for just that one record (using 'expr', not the primary key). Does that query cause an ArrayIndexOutOfBoundsException?

smiklosovic commented 6 years ago

There were some null fields in UDTs for sure. The field we were sorting by was not part of that UDT, it was regular column, of time date, we are going to try to use timeuuid instead in that field and sort by that one, there is already usecase like this in our app. I'll update you about the results.

smiklosovic commented 6 years ago

It doesnt work at all. Whenever we drop the index we can not do sort queries. I ensured all fields in UDT are non-null and we are doing sorts by timeuuid as a field in that table. Now I am getting this:

java.nio.BufferUnderflowException: null at java.nio.Buffer.nextGetIndex(Buffer.java:506) ~[na:1.8.0_181] at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361) ~[na:1.8.0_181] at org.apache.cassandra.serializers.CollectionSerializer.readCollectionSize(CollectionSerializer.java:79) ~[apache-cassandra-3.7.2.jar:3.7.2] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.frozenCollectionSize(ColumnsMapper.scala:272) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:202) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:182) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:175) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:133) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper.$anonfun$columns$4(ColumnsMapper.scala:105) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]

smiklosovic commented 6 years ago

soooo what is totally awkward is that it all works, even with sorts, only in case we use "select * from" instead of "select column1,column2 from" ....

It seems like selecting columns doesnt work but doing * does. This is the most weird bug I have ever seen.

rampeni commented 6 years ago

@smiklosovic that's exactly what we ran against in #394

jgerew commented 6 years ago

@smiklosovic @rampeni Just curious, does the fix I posted resolve the issues you guys are facing? I have yet to hear from anyone on stratio concerning this ticket unfortunately. We've been running with a forked branch with the code change above.

smiklosovic commented 6 years ago

i have a feeling this project is dead.

rampeni commented 6 years ago

same feeling here, we will probably move on rather than spending more time trying to fix