cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.58k stars 3.71k forks source link

sql: warn users away from updated at indexes #41058

Open awoods187 opened 4 years ago

awoods187 commented 4 years ago

One thing we noticed when attempting to debug a recent transaction retry is that code ported over from Oracle can use “updated at indexes” to power efficient queries. In CockroachDB, these queries are messy as they can cause hotspots. This is true not just in CRDB but in all distributed systems. Updated at indexes cause multiple updates for the same range of data. Customers might like these in Oracle but we provide better alternatives that pair well with our underlying architecture. Oftentimes, we end up asking developers to remove these indexes from their application. We should provide a warning to users not use updated at indexes or perhaps even forbid them behind sql safe updates.


Update from @rafiss:

Let's start by adding a notice when creating a table or creating an index, and the first column of any index (primary or secondary) contains a column with DEFAULT current_timestamp().

The notice should suggest using a hash-sharded index and point to the docs: https://www.cockroachlabs.com/docs/stable/hash-sharded-indexes.html

We'll probably want this in 22.1, and not in 21.2, since hash-sharded indexes will still be "experimental" in 21.2.

See https://github.com/cockroachdb/cockroach/pull/68964 for an example of another change to add a notice.

Epic CRDB-7363

Jira issue: CRDB-5488

solongordon commented 4 years ago

Can you be more specific about what an "updated at" index is? Does this just mean an index on an updated_at TIMESTAMPTZ column?

rkruze commented 4 years ago

I've seen schemas like this in the past:

CREATE TABLE t1 (id UUID PRIMARY KEY, updated_at TIMESTAMPTZ, INDEX updated_at_index (updated_at));

We should provide a warning or make it harder to create an index where the first part of the index is a timestamp as these always tend to be a hotspot or point of contention.

rafiss commented 3 years ago

cc @vy-ton -- should we just log a warning/notice if we ever see an index where the first column is a timestamp?

a little section in the web UI that shows potentially problematic schema definitions?

michae2 commented 2 years ago

If the timestamp column has DEFAULT current_timestamp() it seems like even more of a red flag. (And I wonder if DEFAULT unique_rowid() is also a red flag, as this uses the current time internally.)

Maybe the user should be prompted to use a hash-sharded index for these cases?

michae2 commented 2 years ago

(And I wonder if DEFAULT unique_rowid() is also a red flag, as this uses the current time internally.)

Also see https://github.com/cockroachdb/cockroach/pull/54675 for unordered_unique_rowid.

rafiss commented 2 years ago

I agree to starting with adding this warning when a column has DEFAULT current_timestamp() and it appears as the first column in an index (either primary or secondary). I'll update the issue description.

XiongKezhi commented 2 years ago

Hi, I think this is a good start for me as a newcomer so I wonder if this is still an issue? Also, I'm not sure If I understand the context here: is it because of locality (the most recently updated ones are updated more often and new data is always written in the end) that causes a hot spot?

vy-ton commented 2 years ago

fyi @devadvocado @nickvigilante this was the issue that overlaps with index recommendations for hash-sharded indexes

michae2 commented 2 years ago

Maybe a way to approach this would be via the new index recommendation engine. We could recommend a hash-sharded index when the key seems like something sequential.

michae2 commented 1 year ago

Adding SQL Queries team so that we can consider doing something with index recommendations.

bodagovsky commented 2 months ago

Hi everyone! Just wanted to notify that I have started working on that issue

bodagovsky commented 2 months ago

What about DATE and TIME types? Do we want to shoot a notice on these as well?

bodagovsky commented 2 months ago

I will be ready to push a PR soon, so I would like to clarify:

michae2 commented 2 months ago

@bodagovsky good point, I think it would make sense to check for these specific functions in the column's DEFAULT expression rather than checking the column type:

bodagovsky commented 2 months ago

Sorry, if this one is messy, my first PR in open source Hope I will manage to fix all your comments soon!

bodagovsky commented 2 months ago

cc @awoods187 @michae2