basho / riak

Riak is a decentralized datastore from Basho Technologies.
http://docs.basho.com
Apache License 2.0
3.92k stars 534 forks source link

Paginated query over 2i with range return non existent data #498

Open peczenyj opened 10 years ago

peczenyj commented 10 years ago

Hi all

I am using Riak 1.4.2 + levelDB and I find something strange.

We have one index called 'expiration_epoch_int', it is like a TTL for a particular key in this bucket. To find expired data to delete is just query over expiration_epoch_int between 0 and 'now'. For a small amont of data it seems really good.

But today I find this: the first 10 results from this query return non-existent keys. It was already deleted. I receive one 'HTTP/1.1 404 Object Not Found' if I try to inspect.

If I use a small range, like around +/- 1 second from now, I can find good results (keys who exists in Riak) but if I start from 0 ( or 1) at least the begining are keys who does not exist.

If I use return_terms=true I can find the expiration_epoch_int too (it returns data between 10 and 20 days ago). I am using the PBC interface for query and delete.

So, my question: why this happens? can be related to pagination (maybe some cache)? When we perform our cleanup process, we process, for example, ~7 x10^6 keys.

To control the expiration of a huge amount of data, it is save use only one secondary index? There is some limit for a huge number of keys? I have no idea where I can start to investigate this.

I will try to run a more complete test to find the % of deleted keys returned by Riak.

jaredmorrow commented 10 years ago

Also /cc'ing @engelsanchez and setting milestone for 2.0.1. Since I don't know if this was already fixed in the 1.4.x series.

agnibha92 commented 5 years ago

By any chance, anyone is looking into this issue. We are also facing the same issue in Riak 2.2.3 version

peczenyj commented 5 years ago

By any chance, anyone is looking into this issue. We are also facing the same issue in Riak 2.2.3 version

The last commit in this repository was on "Aug 25, 2017"

I think you should consider an alternative to Riak (Couchbase?)

martincox commented 5 years ago

@peczenyj develop branch isn't being actively maintained (it should perhaps be archived and the default branch changed), but there is still plenty of ongoing development around riak. Take a look at develop-2.2 and develop-3.0. Riak 2.9 has recently been released, with Riak 3.0 being worked on currently and expected later in the year.

peczenyj commented 5 years ago

@peczenyj develop branch isn't being actively maintained (it should perhaps be archived and the default branch changed), but there is still plenty of ongoing development around riak. Take a look at develop-2.2 and develop-3.0. Riak 2.9 has recently been released, with Riak 3.0 being worked on currently and expected later in the year.

but this is "official/basho" or it is "community"?

martincox commented 5 years ago

Basho shut up shop a while ago. @bet365 bought everything to preserve it and open sourced the proprietary stuff (DC replication). There's a few companies now actively contributing (NHS, bet365, TI-Tokyo), alongside the community.

peczenyj commented 5 years ago

@martincox but basho website is still active, also the Riak wikipedia has o mention to @bet365 - to be honest if the first time that I listen about this. So, this is still the official repository?

Glad to know this project is still alive. I spend some good times writing a perl client to the protobuf interface years ago.

martincox commented 5 years ago

@peczenyj Yep, the website is still up, and the basho "brand" is still alive, in name only, though. There needs to be a move away from that (probably just rebranding to just "riak"); it's just that nobody has had the resource to do it.

This is the official repo, yes. And, is still very much alive and kicking. Any community effort is welcomed. :)