Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

Lucene Index Empty after upgrade #412

Open hagir7 opened 5 years ago

hagir7 commented 5 years ago

I appreciate some help as I an trying to upgrade Cassandra from 2.1.11(plugin version 2.1.11.2) to 2.1.19 (plugin version 2.1.19.1) and have a lucene index that comes with this upgrade. I couldn't find compatibility info between these 2 versions and I was losing the index on upgrade. So I resorted to dropping index, upgrading, then recreate index. However, the index is always empty after upgrade:

This is my keyspace:

CREATE KEYSPACE mwl WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '1'}  AND durable_writes = true;
CREATE TABLE mwl.mwl (
    anum text,
    anum_universal_id text,
    iid text,
    id_universal_id text,
    version int,
    current_version int static,
    event text,
    event_type text,
    fully_qualified_anum text,
    fully_qualified_id text,
    isr text,
    lucene text,
    birth_date text,
    name text,
    requested_procedure_ids set<text>,
    version_uuid timeuuid,
    PRIMARY KEY ((anum, anum_universal_id, id, id_universal_id), version)
) WITH CLUSTERING ORDER BY (version DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE CUSTOM INDEX mwl_lucene_idx ON mwl.mwl (lucene) USING 'com.stratio.cassandra.lucene.Index';

Then I create lucene after upgrade to 2.1.19 by running:

I tested this with a single cassandra node.

  1. In 2.1.11: Created the keyspace and index. Inserted some data. Index populated. I check using SELECT count(*) FROM mwl.mwl WHERE lucene='{filter:{type:"wildcard",field:"fully_qualified_anum",value:"acn**"},refresh:true}'; , a few values are returned

  2. I drop lucene in 2.1.11 by running: drop index mwl.mwl_lucene_idx;

  3. I upgrade my single node: nodetool upgradesstables, then nodetool drain, then stop cassandra, replace with new version and plugin, start cassandra, nodetool upgradesstables, and finally nodetool version/status to verify all looks good.

  4. Once my single node is up and running, I can verify table is populated. Then I run:

    CREATE CUSTOM INDEX IF NOT EXISTS ON mwl.mwl (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = 
    {'refresh_seconds':'60',
    'indexing_threads':'0',
    'indexing_queues_size':'50',
    'schema':'{fields:{anum:{type:"string",indexed:true,case_sensitive:false},
    anum_universal_id:{type:"string",indexed:true,case_sensitive:false},
    iid:{type:"string",indexed:true,case_sensitive:false},
    id_universal_id:{type:"string",indexed:true,case_sensitive:false},
    name:{type:"string",indexed:true,case_sensitive:false},
    fully_qualified_id:{type:"string",indexed:true,case_sensitive:false},
    fully_qualified_anum:{type:"string",indexed:true,case_sensitive:false},
    event_type:{type:"string",indexed:true,case_sensitive:false},
    event:{type:"string",indexed:true,case_sensitive:false},
    requested_procedure_ids:{type:"string",indexed:true,case_sensitive:false},
    birth_date:{type:"string",indexed:true}}}'};
  5. Now index is empty SELECT count(*) FROM mwl.mwl WHERE lucene='{filter:{type:"wildcard",field:"fully_qualified_anum",value:"acn**"},refresh:true}'; , 0 is returned

I turned debugging for <logger name="com.stratio.cassandra" level="DEBUG"/> and I am getting no error there. In fact, I see rows being added but still an empty index at the end. I am not sure when or why it gets lost. I also have some other regular cassandra indices and are not affected. Seems that only data affected is the lucene index.

Any help is much appreciated.