Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

SCLI does not remove index data when Cassandra removes rows using primary key #359

Open zengzh opened 7 years ago

zengzh commented 7 years ago

Hi,

I used SCLI 3.0.10.3.

Using the official script test-users-create.cql, I create table test.users, build SCLI index and populate the table with some records. Then I delete the records with specified primary key columns:

delete from test.users where name='XXX' and gender='XXX' and animal='XXX' and age='XXX';

Suprisingly, SCLI does not delete the corresponding index data after Cassandra removes the rows.

This may lead to the problem that the size of the index data keeps increasing after a set of insert & delete operations on Cassandra.

Does anybody know the solution? Thanks.

karankap commented 6 years ago

Is this really an issue, came across another ticket which says TTL rows are fully supported.

https://github.com/Stratio/cassandra-lucene-index/issues/321

karankap commented 6 years ago

@johnyannj - We have a heavy data use-case (~500M records) getting added to Cassandra each day with a TTL of 60 days. Need Lucene indexing capability, however, the decision is stuck since we want to confirm whether the records would be deleted from index once the TTL expires.

There are 2 issues (apart from this one) which looks related, thus the functionality is not getting clear. https://github.com/Stratio/cassandra-lucene-index/issues/321 https://github.com/Stratio/cassandra-lucene-index/issues/365

Could you please confirm if rows that are deleted when TTL expires (from Cassandra) would be deleted from Lucene index as well? Thanks.

Tagging @jpgilaberte-stratio

karankap commented 6 years ago

@johnyannj - any updates on this please? I have looked at the source code of the plugin, as per the implementation it looks that the TTL rows should be deleted from index, however, the behaviour is unexpected. A confirmation would really help!

johnyannj commented 6 years ago

@karankap I try modify src/main/scala/com/stratio/cassandra/lucene/IndexWriterWide.scala#commit()

add one logic: Those clustering in "clusterings" which are not found in cassandra. I delete them in lucene index.

I did not validate it about ttl.

you can try~~~

ealonsodb commented 6 years ago

Hi @zengzh: We have discovered a bug and i think its is solved in #375. TTl deletions are resolved in compactions.

karankap commented 6 years ago

@ealonsodb - I looked at the PR and it uses Cassandra version 3.0.14. I am using Cassandra 3.10, tried the build from branch issue_359 on Cassandra 3.10 but that seems to be failing while starting Cassandra. Getting "java.lang.NoSuchMethodException: org.apache.cassandra.cql3.statements.SelectStatement.getPageSize(org.apache.cassandra.cql3.QueryOptions)"

ealonsodb commented 6 years ago

Hi @karankap:

i have also commited to branch-3.11.0, please update cassandra and use this version

skyline1688 commented 6 years ago

Hi @ealonsodb-stratio

The bug might be not fixed, please refer to my comment:

365