Boost ShardingSphere Proxy by Vert.x

TeslaCN commented 2 years ago

Concepts

We have tried hard tuning performance of ShardingSphere in the past few months, and we found the synchronous threading model may be a big bottleneck of the performance. As the backend of the ShardingSphere Proxy, JDBC is inherently synchronous, which is a concurrent performance bottleneck of ShardingSphere Proxy. The Proxy communicates with clients by database protocol, which means users don't need to consider how the Proxy interacts with database. So we may find another solution to avoid the performance bottleneck of JDBC.

Scheme comparison

I did some comparison and made the following table:

	JDBC	Implements Protocols by Netty on our own	MySQL X (ProtoBuf)	Vert.x	R2DBC
Technical Maturity	Words are superfluous	Netty is mature	Not sure (Used by PolarDB-X)	Relative mature	Not sure
Community vibrancy	-	Seems not so active	Very active	General, various between drivers
Relative Concurrent Performance	Low	Depends, can be very high	Lower than JDBC in blocking model, not further investigation yet	High Ranked #1 in the TechEmpower Benchmark Round 15 Single query benchmark.	Not investigate Maybe a little lower than Vert.x
Database Supports	Words are superfluous	Depends	MySQL only	Official supports: MySQL PostgreSQL Wrapped JDBC (theoretically any JDBC compliant database) DB2 Oracle (Preview) MSSQL (Preview) MongoDB Cassandra Redis Refer to: Documentation \| Eclipse Vert.x	MySQL (Seems like a personal project) jasync-sql MySQL (another MySQL driver) Postgres (Hosted in github.com/pgjdbc) Oracle MariaDB MSSQL H2 Refer to: R2DBC
Connection Pool	Third party pool (Such as HikariCP)	Custom	Depends	Built-in (Wrapped JDBC requires a third party pool like HikariCP)	Built-in
Observability and Metrics	Depends	Custom	Not investigate	Supports Micrometer	Metrics: Spring Boot Actuator (Metrics of connection pool)
License	Various	Apache-2.0	Depends	EPL-2.0 & Apache-2.0 vert.x/LICENSE.md at master · eclipse-vertx/vert.x	Others: Apache-2.0 Oracle: Apache-2.0 & UPL-1.0
Workload	Already have	Heavy	A little heavy	More than integrating JDBC	Maybe a little more than integrating Vert.x

The current backend of ShardingSphere Proxy is JDBC, which is inherently synchronous and has poor performance in high concurrency scenario.

Why not implementing protocol on our own?

Implementing databases' protocol and doing packet forwarding may minimize performance loss. But there is too much we need to do, which may take a long very time. Such as:

Maintaining connections pool on our own
Implementing protocols for each type of database on our own (Really heavy work)
Concurrent programming
Managing distributed transactions on our own

Why not MySQL X Protocol?

For MySQL only
Maintaining connections pool on our own
Implementing X Protocol (Message definition codes can be generated by ProtoBuf)
Managing distributed transactions on our own

Why not R2DBC?

Community activity lower than Vert.x
References fewer than Vert.x
Missing openGauss support

Why Vert.x?

Native driver for MySQL and PostgreSQL
Supports JDBC compliant database (This can be used in openGauss Proxy)
High performance (Ranked #1 in the TechEmpower Benchmark Round 15 Single query benchmark)
Production-ready

But, we still need to control distributed transactions on our own.

Connections Pool

Vert.x build-in connections pool has the following parameters:

maxSize: int
maxWaitQueueSize: int
idleTimeout: int
idleTimeoutUnit: TimeUnit
poolCleanerPeriod: int
connectionTimeout: int
connectionTimeoutUnit: TimeUnit

Transaction

Local Transaction

We can control the local transaction just like how we did in LocalTransactionManager.

Distributed Transaction

An issue opened 2 years ago and the conclusion is we need to manage distributed transaction on our own. https://github.com/eclipse-vertx/vert.x/issues/2939

XA

The JTA and XA in JDBC cannot be used in Vert.x. Transaction management such as Atomikos cannot be reused.

BASE

Seata may not be reused.

Integrating Vert.x into ShardingSphere

Phase 1 Vert.x coexists with JDBC (simple, but inelegant and inextensible)

This is how we did in preliminary performance research. The JDBC backend and Vert.x backend are coexisting until the Vert.x backend become mature. For each DataSource, we maintain a corresponding Vert.x pool in Proxy backend, which means each database will have 2 connections pool (HikariCP and Vert.x Pool). When loading MetaData or using native privileges, we use the JDBC DataSource to do that. When executing CRUD SQL, we use the Vert.x pool to do asynchronous things.

[x] Decouple BackendConnection in Proxy Backend My idea is extracting an interface ConnectionSession. The current BackendConnection rename to JDBCBackendConnection and add VertxBackendConnection.
[ ] Implements Vert.x executor and callback in shardingsphere-infra-executor and shardingsphere-proxy-backend
[ ] Add new modules for reactive in Proxy

shardingsphere-proxy
├── shardingsphere-proxy-backend
├── shardingsphere-proxy-bootstrap
├── shardingsphere-proxy-frontend
│   ├── shardingsphere-proxy-frontend-core
│   ├── shardingsphere-proxy-frontend-mysql
│   ├── shardingsphere-proxy-frontend-opengauss
│   ├── shardingsphere-proxy-frontend-postgresql
│   ├── shardingsphere-proxy-frontend-spi
New modules:
│   ├── shardingsphere-proxy-frontend-reactive-spi
│   ├── shardingsphere-proxy-frontend-reactive-core
│   ├── shardingsphere-proxy-frontend-reactive-mysql
│   ├── shardingsphere-proxy-frontend-reactive-postgresql

For example, those executors won't interact with database in frontend-mysql can be reused by frontend-reactive-mysql.

[ ] Supports LOCAL transaction in Vert.x Proxy There is no out of box distributed transaction manager for reactive framework for now. The first step is implementing LOCAL transaction.

Phase 2 Decoupling JDBC from ShardingSphere (heavy work, but elegant and extensible)

1 Define Configuration API

Vert.x connections pool can be created by URI:

vertx-sql-client/MySQLConnectionUriParser.java at master · eclipse-vertx/vertx-sql-client
vertx-sql-client/PgConnectionUriParser.java at master · eclipse-vertx/vertx-sql-client So we may remain the current API, only changing the prefix of url from jdbc to vertx:

schemaName: sharding_db

dataSources:
  ds_0:
    url: vertx:mysql://127.0.0.1:3306/some_schema
    username: root
    password: root
    connectionTimeoutMilliseconds: 3000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 192
    minPoolSize: 0

rules: []

2 Decoupling JDBC from ShardingSphere (heavy work)

There are many modules coupling with JDBC. We may need to decouple JDBC from modules except ShardingSphere JDBC.

Infra

For example, the class ShardingSphereResource holds a DataSource map. We may decouple DataSource from ShardingSphereResource by defining an interface Resource. The implementations may be JDBCResource, VertxResource or MySQLXResource in the future.

MetaDataLoader

There are many codes in MetaDataLoader coupling with JDBC.

Mode

There are many codes in mode modules coupling with JDBC. And we need to consider (de)serialization.

Phase 3 Removing JDBC from ShardingSphere Proxy

Other Reference

https://discourse.world/h/2020/05/12/Two-alternatives-to-JDBC

CarlKong commented 2 years ago

期待早日实现

github-actions[bot] commented 2 years ago

Hello , this issue has not received a reply for several days. This issue is supposed to be closed.

TeslaCN commented 1 year ago

Since we have discussed about the difficulty of developing Vert.x in ShardingSphere. I'm going to remove Vert.x driver from ShardingSphere-Proxy soon.

Discussion could be referred to https://lists.apache.org/thread/0vd7h44bjjszc5fs2hpftktt4oh4hhw5

The following content is the proposal in discussion.

Split Vert.x code from ShardingSphere into separate branch

Currently ShardingSphere integrates Vert.x as the database driver of ShardingSphere-Proxy. ShardingSphere-Proxy MySQL using Vert.x as the database driver does have a certain performance improvement compared to using JDBC, but the improvement is not as large as expected. During the actual development and use of Vert.x-based ShardingSphere-Proxy, we encountered many problems:

Vert.x-based asynchronous code increases coding complexity and debugging costs.

The existing metadata loading logic is developed based on JDBC (blocking I/O model), and the workload of refactoring the metadata loading logic into asynchronous Vert.x is very heavy. Therefore, ShardingSphere-Proxy driven by Vert.x database cannot use cluster mode.

The metadata code is coupled with JDBC, and requires some refactoring before working with Vert.x to decouple the code from JDBC.

Vert.x does not have a mature solution for distributed transactions, and transactions have not reached a production-ready state.

The ShardingSphere team doesn't have much energy to put into Vert.x driver.

JDBC is standard for Java compared to Vert.x.

Java 19 introduced Virtual Thread to improve performance without changing Java's multithreaded programming model. Although the performance of Virtual Thread has not yet reached the ideal state, it may be able to help ShardingSphere to improve the performance in the future without a lot of code modification. Therefore, we intend to separate the current Vert.x code in ShardingSphere into a separate branch for maintenance to reduce the cost of understanding and maintaining the main code

apache / shardingsphere