Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 262 forks source link

execute direct with DML does not work anymore. #54

Closed yazun closed 3 years ago

yazun commented 4 years ago

While normally not needed nor encouraged, we had some important corner cases when issuing DML directly on a datanode was the only option to get the performance we need.

In XL it was possible to issue: execute direct on (datanodex) $$ insert into table select...

in Tbase the logic of txid serving has changed and while it is still possible to do the results are violating MVCC, i.e. we see disappearing tuples when we do insert and subsequent selects!

Can you shed some light if proper txid serving while in execute direct could be brought back, with MVCC working properly?

Thanks

yazun commented 4 years ago

@SuCuicui Could you please let us know if any of the reported issues by us has been followed? Thank you

q2683252 commented 4 years ago

In what circumstance,do you need execute direct?please let use know.

yazun commented 4 years ago

The plans with 2+ joins over big (tens of billions in total) partitioned tables have been very bad as there was materialization happening without a full push-down, even when a distribution/shard key was the PK. The only viable way until the PG11-12 partition pruning gets into TBase/XL was to use execute direct to have a decent performance without partial results materialization obviously...

yazun commented 4 years ago

Let me know if you'd like to see an example plan. When used in a standard way, the whole processing was 10-50x slower. Fro already long-runnning task it was prohibitively slow for us (weeks vs days).

yazun commented 4 years ago

Also, we relied on the CTE with DML, which with a simple patch could be safely re-enabled on datanodes. The logic we have executes joins and inserts into a multiple tables in a single query. Otherwise we too drastically lose performance..

Jasonysli commented 4 years ago

1,If you need run DML on datanode ,you can simple enable allow_dml_on_datanode on datanode in TBase. Then you can connect to DN to run DML. 2,ABout CTE on datanode, you can do in the similar way.

JennyJennyChen commented 3 years ago

run DML on datanode eg: $ psql -h ${datanode_ip} -p ${datanode_port} -U ${user_name} -d postgres postgres=# set allow_dml_on_datanode = on; postgres=# insert into table...

If you want to view all the data (Contains data from other shards that do not belong to this datanode) of the currently connected datanode, you can set shard_visible_mode=allin. eg: postgres=# set shard_visible_mode=allin; postgres=# select...

yazun commented 3 years ago

Good hint about shard_visible_mode, thanks!