Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

Indexing is taking too long for a 2 GB data ? anything can be done?? #391

Open nirmalsinghkps opened 6 years ago

nirmalsinghkps commented 6 years ago

For a 2 GB data with 3 columns trying to index , its keep on running at back ground been more than 6 hours Still I dont see entry at system."IndexInfo" , quite confused on whats happening at back ground and is this plugin a right candidate for heavy tables with huge data.

**1. How to know the progress of index creation ?

  1. How frequent this index will be updated , after its FIRST indexing ?
  2. Is this plugin an ideal candidate to index when a table has more than 250 Gb of data**
phambryan commented 6 years ago
  1. You can watch the progress by modifying trace statements to INFO. Recompile the plugin. https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.14/plugin/src/main/scala/com/stratio/cassandra/lucene/IndexWriter.scala

  2. This is depending on your settings. But default refresh is triggered every 60s scanning for updates.

  3. This is a partitioning. Keep your partition size no larger than 10G (C* 3.11) with strong CPU/NVME storage. 250G is a lot of data if you're doing 128k columns that's still 2M rows; so only index what you need, and use filter to narrow data set.