cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

sql: make pg_catalog OIDs stable and refer to the descriptor IDs directly #32940

Closed knz closed 5 years ago

knz commented 5 years ago

Is your feature request related to a problem? Please describe.

Today OIDs in pg_catalog are based on the hash of database ID and table ID. This makes it hard / impossible to join between vtables in pg_catalog and crdb_internal.

Describe the solution you'd like

Everywhere when it's reasonable use something that can be decoded into a table ID as pg OID. For example we can have a composite integer format with a type tag in the MSB and when the type is "table" put the descriptor ID in the LSB.

hueypark commented 5 years ago

@knz

I am asking for feedback.

How about changing tree.ID to a signed integer? I think it is reasonable because the vtable's ID is negative.

As it is now, it seems that a hash collision will occur while using a hash rather than creating a unique oid. Is this acceptable? (IndexOid, ColumnOid, ...)

knz commented 5 years ago

How about changing tree.ID to a signed integer? I think it is reasonable because the vtable's ID is negative.

This sentence is like a tail wagging its dog. The choice to make vtable IDs negative is arbitrary and should not be used to drive decisions.

I think the better approach is to think of a numbering scheme where we can look at an ID value and quickly decide whether it's a database, table or vtable (and in the future, schema).

The proposal I made above is to store the type of ID in the most significant bits, then the value of the ID in the lower significant bit, to produce an OID suitable for pg_catalog.

Let us name such a combined ID a "universal object ID" (UOID)

The next step after that would be to adapt the crdb_internal tables to index records using these new UOIDs., then evolve other places inside cockroachdb (for example select * from [123]) to recognize these UOIDs too.

After we do this, it does not matter whether the value is signed or unsigned.

hueypark commented 5 years ago

Thank you for your feedback.

So what do you think about this solution?

Databases and tables OID use descriptor id directly. It's fine because we create the id with sql.GenerateUniqueDescID.

33420 is a prototype. If you agree, I will further develop this and let you know.

knz commented 5 years ago

Databases and tables OID use descriptor id directly. It's fine because we create the id with sql.GenerateUniqueDescID.

This is only valid if the virtual tables also all get different descriptor IDs. Currently they all have ID -1. They must be separate for pg_catalog to be correct.

hueypark commented 5 years ago

@knz Hi, I have prepared a different version of the solution.

Now oid separated by three parts.

I am concerned that the limit on ColumnID is getting smaller.

What do you think? #33697