Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 171 forks source link

compaction is CPU cannot be used efficiently #361

Closed kamijin-fanta closed 7 years ago

kamijin-fanta commented 7 years ago

I was investigating the problem of compaction very slow when I created lucene index.

# nodetool compactionstats -H

id                                   compaction type       keyspace   table   completed total    unit  progress
c0560fd0-a43f-11e7-bb0a-99ea385ac574 Secondary index build ex         message 1.32 MiB  1.76 MiB bytes 74.80%
Active compaction remaining time :        n/a

after 30 second

# nodetool compactionstats -H

id                                   compaction type       keyspace   table   completed total    unit  progress
c0560fd0-a43f-11e7-bb0a-99ea385ac574 Secondary index build ex         message 1.37 MiB  1.76 MiB bytes 77.60%
Active compaction remaining time :        n/a

Since indexing_threads is unspecified, I guess that the same number of CPU cores as 6 is specified. However, the usage rate of the CPU never exceeds 100% / 600%.

I thought it suspicious and looked at the situation of the thread

image

use tools: https://visualvm.github.io/

Only one thread is running at the same time...

Does Lucene Incexer have conditions to use concurrent threads?


memo

enable rmi server

echo 'JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=192.168.10.181"' >> /etc/cassandra-env.sh

spec

kamijin-fanta commented 7 years ago

When multiple compactions were operated at the same time, it was slightly improved. However, it can not be said that 100% of resources can be used.

image

It seems to be meaningless to specify indexing_threads larger thanconcurrent_compactors. In Cassandra's configuration, concurrent_compactors are generally 2. Are you specifying big numbers?

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__concurrent_compactors

kamijin-fanta commented 7 years ago

I was misunderstanding. I guessed the indexer is a bottleneck. But it was a CompactionExecutor.

It is my own answer.

I hope this will be helpful to developers. Finally, thanks for the wonderful plugin!