Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

Indexing time for data already present in C* #372

Closed junaidnasir closed 6 years ago

junaidnasir commented 6 years ago

Using Cassandra: 3.11.0 Lucene: 3.11.0 C* is single node (test), running on GCE 8CPU, 30GB ram, 100 GB disk.

I have a C* deployment with data already present in it. around 88M records in one table. I created the index using

CREATE CUSTOM INDEX dciindex ON dci_rtu  () 
USING 'com.stratio.cassandra.lucene.Index' 
WITH OPTIONS = {
    'refresh_seconds': '1',
    'schema': '{
       fields: { rtu: {type: "string"},
       day:{type: "date", pattern: "yyyy-MM-dd"},
        datetime:{type: "date"},
        value:{type:"integer"}
      }
    }'
};

But after many hours, if I try to run the query

select * from dci_rtu where expr(dciindex, '{  filter: {type: "range", field: "day",lower: "2017-04-25" , upper:"2017-05-01"}}');
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'rec
eived_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

in system logs it says index not ready. how long does it take to index? any benchmarks? or method how can i calculate because production server has much more data

ealonsodb commented 6 years ago

Hi @junaidnasir:

As stated in doc, there are two parameters in index that can affect indexing througput, indexing_threads and indexing_queues_size

The expected indexing speed is around thousands of rows/sec

You should read #237 and #361 and increase concurrent_compactors

Hope this helps

junaidnasir commented 6 years ago

@ealonsodb thank you so much for the reply, I just came back from vacations.

237 & #361 are very insightful, will update this thread if some problem arises.