Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 261 forks source link

Foreign Key on replicated tables #123

Closed yazun closed 2 weeks ago

yazun commented 2 years ago

In the past we patched XL to enable FKs between the replicated tables. We see there's a check preventing FKs on replicated tables based on _PGREGRESS define.

https://github.com/Tencent/TBase/blob/master/src/backend/commands/tablecmds.c#L8841

Could you explain if we could change this check to allow FKs between replicated tables only (we never had problem with them in the past, also combined with recursive CTE). Thank you

JennyJennyChen commented 2 years ago

Other colleagues may add this code just to want to limit the data writing of the replicated table to only consider this node, and not care about other nodes to enhance the performance of data writing.

yazun commented 2 years ago

Could this be enabled with the config/compile switch?

JennyJennyChen commented 2 years ago

Regarding the reason for prohibiting the foreign key on the replication table, I consulted the relevant personnel. The main reason is that there can be no triggers on the replication table, and the foreign keys in the PG are also implemented by triggers, so the foreign keys are also prohibited. If you are sure that you are not using triggers on the replicated table, you can remove this if statement in the source code and recompile the new bin to solve:

image
yazun commented 2 years ago

So the question emerges why we could not have triggers on the replicated table, granted such triggers would not use anything distributed? It seems it would be consistent with other limitations coming from distribution, maybe a GUC allowing for this would make sense?

JennyJennyChen commented 2 years ago

The replication table is that all DN nodes have full data about this table, and all DN nodes will perform operations on this data. Then suppose that there is a trigger on this replicated table: when inserting data, add the inserted data to another non-replicated table, then this non-replicated table will have N pieces of data (N is the number of DN nodes).

Triggers can be used in distributed systems, but replication tables cannot be used, shard tables can be used.

yazun commented 2 years ago

Sure, there's limitation that a trigger cannot touch sharded tables. The same caveat comes already i.e. with functions that could be executed on DNs only if used in the from part of the query - if they have DML inside such function would be possibly only executed on a single DN, corrupting the schema. I think it is fine if the risk is well understood and a GUC allowing for it would be most welcome.