Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

Very slow process of compaction after index setup #390

Open karpa13a opened 6 years ago

karpa13a commented 6 years ago

Good day C* is 3.11; plugin according version. ubuntu 16.04, java 1.8 latest version one DC, 3 nodes, keyspace with rf=3 at EC2 with 2 CPU and 4Gb memory each.

cluster works well, data inserted by batches each 15 mins, no problems with compactions and performance, datasize around 15M rows but im facing with strange behavior after creating lucene index: ive created index

CREATE CUSTOM INDEX gsm_index ON gsm ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         sid: {type: "string"},
         timestamp: {type: "date", pattern: "yyyy/MM/dd"},
         place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}
      }
   }',
   'indexing_threads': '4'
};

index created and works well on next day i see LA more than 3 (on each node), with queue of 8 compactions. i was dropped index and all compactions where done in 15 mins. ive recreated index and got same result on next day. table simple as follows:

CREATE TABLE gsm (
   sid text,
   timestamp timestamp,
   latitude double,
   longitude double,
   /other columns defenitions/,
   PRIMARY KEY (sid, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)

do i need update EC2 instance with more power? or i hit a bug?

FourSeventy commented 6 years ago

What type of disks are you using? I alleviated similar compaction problems by switching to solid state drives.

karpa13a commented 6 years ago

@FourSeventy unfortunately but it's not an IO bottleneck( CPU bound tasks(

karpa13a commented 6 years ago

unfortunately updating node from t2.medium(2 cpu) to t2.xlarge(4 cpu) didnt help. it just eat 350% of CPU.

this makes lucene indexes totally unusable(

may be i can do some kind of debug?

btw it's ok, that MemtableFlushWriter spams log file in around 2 mins? when there is no reads/updates

INFO  [MemtableFlushWriter:372] 2018-05-18 07:24:56,673 Index.scala:127 - Flushing Lucene index  /gsm_index/
INFO  [MemtableFlushWriter:373] 2018-05-18 07:26:00,154 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:374] 2018-05-18 07:27:57,105 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:375] 2018-05-18 07:29:52,975 Index.scala:127 - Flushing Lucene index /gsm_index/
karpa13a commented 6 years ago

okay i created index without "place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}" part and now compactions didnt stuck.

what was wrong with geo_point? currently index saved once in 3 hours: INFO [MemtableFlushWriter:508] 2018-05-20 12:00:02,154 Index.scala:127 - Flushing Lucene index ... INFO [MemtableFlushWriter:515] 2018-05-20 15:00:02,968 Index.scala:127 - Flushing Lucene index ...

nirmalsinghkps commented 6 years ago

So what’s the Cassandra version and what’s the plugin version did we use to avoid compatibility issues? Any suggestions