Open jgerew opened 6 years ago
The problem appears to lie with how the values are separated by UDT fields. All of the data saved prior to the ALTER
statement has only 2 values (index 0 and 1), but the UDT now has 3 fields (index 0, 1, and 2). The UDT field names are iterated over and expects the same number of indexes in the value as there are in the field names causing the exception. See current code below:
private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
val itemValues = udt.split(value)
((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
val itemValue = itemValues(i) #causes ArrayIndexOutOfBoundsException
if (itemValue == null) {
columns
} else {
val itemName = udt.fieldNameAsString(i)
val itemType = udt.fieldType(i)
val itemColumn = column.withUDTName(itemName)
this.columns(itemColumn, itemType, itemValue) ++ columns
}
})
}
If we change the code to expect the possibility of an index mismatch we can resolve the issue:
private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
val itemValues = udt.split(value)
((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
val itemValue = if (i < itemValues.length) itemValues(i) else null #see here
if (itemValue == null) {
columns
} else {
val itemName = udt.fieldNameAsString(i)
val itemType = udt.fieldType(i)
val itemColumn = column.withUDTName(itemName)
this.columns(itemColumn, itemType, itemValue) ++ columns
}
})
}
Hi @jgerew
We are hitting the very same issue.
Our "workflow" is like this:
We have completely empty DB and we create schema, we insert data and then we create index so all is indexed. All works. After that we drop the index and recreate the very same index again and all queries are giving us this exception.
I was going through the very same code as you did and yes it failed on that row.
It is worth to say what if we dont use "expr" queries but "where lucene = query" it all works.
Why?
Could you look into this please?
We are using 3.7.2 Cassandra with 3.7.6 plugin.
@adelapena
I am taking back my point about expr vs lucene, it fails either way.
what is even more strange is that we can not use sort after we drop and create an index but we can continue to use queries without sorting, all lucene and expr are working without sorting even we drop and create index again.
Hi @smiklosovic,
I wish I could help, but it doesn't sound like the same thing we were facing. Our problem was due to modifying the UDT type. I don't think it being an index had anything to do with it, we just ran into issues with the stratio plugin when it was formulating the result set. In our case, stratio was expecting the same number of values as there were fields in the UDT and since we had added a new field we got the ArrayIndexOutOfBoundsException.
That being said, we did run into some issues with dropping/re-adding indexes and getting ArrayIndexOutOfBoundsExceptions. We didn't dig too far into it, but we figured it may have been due to schema replication. You may want to try to run your index alterations with consistency set to ALL.
Good luck!
Joe
@jgerew
we were also thinking this is due to altering an udt - we have our migration scripts and we indeed altered UDT as the last script by adding a field.
But once we were about to replicate this "from scratch" we "describe keyspace"-ed the DB where all was consolidated in a flat schema so no altering at all but we are facing this issue with index drop and sorting anyway.
We wanted to workaround it to have timestamp as part of primary key as clustering column so we would "order by"-ed but
InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."
Interesting, do you have data stored for all of the UDT fields? If not, I wonder if that could be causing you to have the same issue I did. For us, the index had nothing to do with the problem. We got the ArrayIndexOutOfBoundsException whenever we queried the DB (using 'expr') where the result set would have returned records containing the UDT data with missing pieces.
Try inserting a new (and fully populated) record into your table and querying for just that one record (using 'expr', not the primary key). Does that query cause an ArrayIndexOutOfBoundsException?
There were some null fields in UDTs for sure. The field we were sorting by was not part of that UDT, it was regular column, of time date, we are going to try to use timeuuid instead in that field and sort by that one, there is already usecase like this in our app. I'll update you about the results.
It doesnt work at all. Whenever we drop the index we can not do sort queries. I ensured all fields in UDT are non-null and we are doing sorts by timeuuid as a field in that table. Now I am getting this:
java.nio.BufferUnderflowException: null at java.nio.Buffer.nextGetIndex(Buffer.java:506) ~[na:1.8.0_181] at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361) ~[na:1.8.0_181] at org.apache.cassandra.serializers.CollectionSerializer.readCollectionSize(CollectionSerializer.java:79) ~[apache-cassandra-3.7.2.jar:3.7.2] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.frozenCollectionSize(ColumnsMapper.scala:272) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:202) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:182) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:175) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:133) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at com.stratio.cassandra.lucene.mapping.ColumnsMapper.$anonfun$columns$4(ColumnsMapper.scala:105) ~[cassandra-lucene-index-plugin-3.7.6.jar:na] at scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
soooo what is totally awkward is that it all works, even with sorts, only in case we use "select * from" instead of "select column1,column2 from" ....
It seems like selecting columns doesnt work but doing * does. This is the most weird bug I have ever seen.
@smiklosovic that's exactly what we ran against in #394
@smiklosovic @rampeni Just curious, does the fix I posted resolve the issues you guys are facing? I have yet to hear from anyone on stratio concerning this ticket unfortunately. We've been running with a forked branch with the code change above.
i have a feeling this project is dead.
same feeling here, we will probably move on rather than spending more time trying to fix
Apache Cassandra Version: 3.11.2 Stratio Cassandra Lucene Index Version: 3.11.1.0
Reproduction steps:
Create a UDT
CREATE TYPE test_udt (name text, type text);
Create a stratio lucene index using the UDT
Insert data into the test_table
Execute a query using stratio lucene expression to verify results
SELECT * FROM test_table WHERE expr(test_index, '{query:[{type:"boolean","should":[{type:"wildcard",field:"raw_data.test.name",value:"*"}]}]}');
Add a field to the UDT
ALTER TYPE test_udt ADD test_code int;
Execute a query again