Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

When a column from the index is used in a predicate cassandra always returns 0 records. #399

Open romulogoncalves opened 6 years ago

romulogoncalves commented 6 years ago

When we add extra predicate on a column used in the index, cassandra returns 0 records despite the predicate returns True.

We have the issue with spark-cassandra-connector:2.3.1-s_2.11 and spark 2.2.0 and cassandra-lucene-index-plugin-3.11.1.0. To repeat it just use earthquakes.csv from: https://docs.datastax.com/en/tutorials/gis.zip

Then

cqlsh> CREATE KEYSPACE gis WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2 };

cqlsh> USE gis;

cqlsh:gis> CREATE TABLE earthquakes ( 
             datetime timestamp, 
             latitude double, 
             longitude double, 
             depth double, 
             magnitude double, 
             magtype text, 
             nbstations int, 
             gap double, 
             distance double, 
             rms double, 
             source text, 
             eventid int,
             PRIMARY KEY (datetime, latitude, longitude)
           );

cqlsh:gis> COPY earthquakes (datetime, latitude, longitude, depth, magnitude, magtype, nbstations, gap, distance, rms, source, eventid) FROM '<path>/earthquakes.csv' WITH HEADER = 'true';

To create the index:

cqlsh:gis> ALTER TABLE earthquakes add lucene text;

cqlsh:gis> CREATE CUSTOM INDEX earthquakes_index ON earthquakes(lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         geo_point: {
             type: "geo_point",
             validated: true,
             latitude: "latitude",
             longitude: "longitude",
             max_levels: 15
          }
       }
   }'
};

Query 1 (returns 28 records):

cqlsh:gis> SELECT * FROM earthquakes WHERE lucene ='{  filter: {     type: "geo_bbox",     field: "geo_point",     min_latitude: 40.0,     max_latitude: 50.0,     min_longitude: 50.0,     max_longitude: 60.0  } }';

Query 2 (should also return 28 records, return 0 records):

cqlsh:gis> SELECT * FROM earthquakes WHERE lucene = '{  filter: {     type: "geo_bbox",     field: "geo_point",     min_latitude: 40.0,     max_latitude: 50.0,     min_longitude: 50.0,     max_longitude: 60.0  } }' and latitude > 0.0  ALLOW FILTERING;

The predicate and latitude > 0.0 returns True, we do not understand why it leads to a result of 0 records.