Open tbg opened 8 years ago
@petermattis @RaduBerinde what do you suggest we do about this?
Will this be added to cockroach, and if so when? It's been 3 years since the issue was reported, so just wondering
deferring to @RaduBerinde @rytaft for comments
I am not aware of this feature being on the roadmap (cc @awoods187), but it wouldn't be very hard to implement the BERNOULLI method described above given that we're already doing something very similar for table statistics collection. Adding a REPEATABLE option should be relatively easy, but I'm not sure it would be that useful since any changes in data distribution could change the result, even without changes to the data itself.
Implementing the BERNOULLI method would actually be simpler than what we're already doing for CREATE STATISTICS
because we wouldn't need to maintain a sample reservoir (that is only necessary to collect a pre-defined number of samples). We'd need to write a new DistSQL processor, but it would be VERY simple, consisting of a single random number generator to decide whether to keep or discard each row.
Adding a SYSTEM sampling method would require a different approach that is aware of how data is stored in RocksDB.
@Kumamon38 could you tell me a little bit more about how and why you'd like to use this potential feature?
Hi and thanks everyone who answered and explained! I am currently using Nakama solution for my game and I was looking for the best way to pick random rows from the database. I saw on different posts on stack overflow several ways to do that and for Postgres the best one so far after years seemed to be the tablesample system since postgres 9.5, so I had a look if you did something similar and found this post. For now I managed to simplify how I pick my players without a random so it’s fine. I was just wondering if you will add it some day, not asking to add it :) Thanks anyway!
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Enterprise customer here, would like to throw our hat in this ring, we would be interested in this feature. Thank you.
cc @vy-ton for visibility
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
still relevant
Still relevant, I would benefit from this feature.
Postgres in 9.5 introduced the
TABLESAMPLE
clause:Prior to 9.5, similar things could be done manually: https://www.periscopedata.com/blog/how-to-sample-rows-in-sql-273x-faster.html https://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql
Implementing something like
TABLESAMPLE
is likely relatively difficult, but we could check that a manual query which performs something similar is available and gets a somewhat decent query plan.Opened this issue because I was asked about it in a recent tech talk.
Jira issue: CRDB-6181