apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.78k stars 6.7k forks source link

sharding-proxy insert blob size is not correct #21627

Open yoloz opened 1 year ago

yoloz commented 1 year ago

Bug Report

Which version of ShardingSphere did you use?

  1. apache-shardingsphere-5.0.0-shardingsphere-proxy-bin
  2. apache-shardingsphere-5.2.0-shardingsphere-proxy-bin

Which project did you use? ShardingSphere-JDBC or ShardingSphere-Proxy?

ShardingSphere-Proxy

Behavior

image

above it, id==1 insert by proxy and id==2 insert by direct,maybe you think this tool(dbeaver) not support fine, i read this record and write to file,record(id==1 )open fail

Reason analyze (If you can)

config(seriver.yml) show sql, can see insert into blob_clob(id,b_blob,c_clob) values(1,_binary'......','........'),maybe get sql from packet ?

Steps to reproduce the behavior, such as: SQL to execute, sharding rule configuration, when exception occur etc.

props:

max-connections-size-per-query: 1

kernel-executor-size: 16 # Infinite by default.

proxy-frontend-flush-threshold: 128 # The default value is 128.

proxy-hint-enabled: false

sql-show: true

check-table-metadata-enabled: false

...

* **${shardingsphere-proxy}/bin/start.sh**
### Example codes for reproduce this issue (such as a github link).
```java
public class BlobClobTest {

    private String url;
    private Properties properties;

    public static void main(String[] args) throws Exception {
        BlobClobTest test = new BlobClobTest();
        test.setUp();
//        test.createTable();
        test.writeStream();
    }

    public void setUp() throws ClassNotFoundException {
        Class.forName("com.mysql.cj.jdbc.Driver");
        url = "jdbc:mysql://localhost:3307/shadow_db?useUnicode=true&characterEncoding=UTF-8";
        properties = new Properties();
        properties.put("user", "root");
        properties.put("password", "root");
    }

    public void createTable() {
        String sql = "CREATE TABLE `blob_clob` (\n" +
                "  `id` int(11) DEFAULT NULL,\n" +
                "  `b_blob` mediumblob,\n" +
                "  `c_clob` mediumtext\n" +
                ") ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;"; //mysql
        try (Connection conn = DriverManager.getConnection(url, properties);
             PreparedStatement stmt = conn.prepareStatement(sql)) {
            System.out.println(stmt.executeUpdate());
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

    public void writeStream() {
        String sql = "insert into blob_clob(id,b_blob,c_clob) values(?,?,?)";
        try (Connection conn = DriverManager.getConnection(url, properties);
             PreparedStatement stmt = conn.prepareStatement(sql)) {
            stmt.setInt(1, 1);
            stmt.setBlob(2,
                    Files.newInputStream(Paths.get(System.getProperty("user.home"), "test", "test.png")));
            stmt.setClob(3,
                    Files.newBufferedReader(Paths.get(System.getProperty("user.home"), "test", "test.txt")));
            System.out.println(stmt.executeUpdate());
        } catch (SQLException | IOException e) {
            e.printStackTrace();
        }
    }

}
TeslaCN commented 1 year ago

Please try adding useServerPrepStmts=true to JDBC URL. Such as

jdbc:mysql://localhost:3307/shadow_db?useUnicode=true&characterEncoding=UTF-8&useServerPrepStmts=true
yoloz commented 1 year ago

Please try adding useServerPrepStmts=true to JDBC URL. Such as

jdbc:mysql://localhost:3307/shadow_db?useUnicode=true&characterEncoding=UTF-8&useServerPrepStmts=true

thanks a lot

yoloz commented 1 year ago

maybe can help #3024 #21518 @jitawangzi @xriqyu https://github.com/yoloz/shardingsphere/tree/supportJdbcBlob

TeslaCN commented 1 year ago

image

TeslaCN commented 1 year ago

This is how Connector/J convert InputStream to _binary literal. https://github.com/mysql/mysql-connector-j/blob/27603148f10a5f47467bec7ad26a5ca28da63c72/src/main/protocol-impl/java/com/mysql/cj/protocol/a/InputStreamValueEncoder.java#L71-L137

yoloz commented 1 year ago

This is how Connector/J convert InputStream to _binary literal. https://github.com/mysql/mysql-connector-j/blob/27603148f10a5f47467bec7ad26a5ca28da63c72/src/main/protocol-impl/java/com/mysql/cj/protocol/a/InputStreamValueEncoder.java#L71-L137

yeah, if use hex nothing to do(type bytes use it), but escap will change byte

TeslaCN commented 1 year ago

There are 2 reasons that the binary value is damaged:

  1. Proxy reads COM_QUERY SQL and convert it with charset (which usually UTF-8 in Proxy). The byte[] to UTF-8 String conversion may damage values of literal prefixed with _binary. https://github.com/apache/shardingsphere/blob/e5c8dc7778f79d544935803a7093aee53ea91bf2/db-protocol/mysql/src/main/java/org/apache/shardingsphere/db/protocol/mysql/packet/command/query/text/query/MySQLComQueryPacket.java#L44 https://github.com/apache/shardingsphere/blob/e5c8dc7778f79d544935803a7093aee53ea91bf2/db-protocol/mysql/src/main/java/org/apache/shardingsphere/db/protocol/mysql/payload/MySQLPacketPayload.java#L436-L440

  2. SQL String to bytes conversion in MySQL Connector/J damaged the _binary literal in SQL. This happened at ShardingSphere-Proxy sending SQL to MySQL server. serverEncoding is usually utf8mb4 or utf8. https://github.com/mysql/mysql-connector-j/blob/ad86f36e100e104cd926c6b81c8cab9565750116/src/com/mysql/jdbc/Buffer.java#L670 image

To fix this issue, my idea is:

  1. Implement visitBlobValue in org.apache.shardingsphere.sql.parser.mysql.visitor.statement.impl.MySQLDMLStatementSQLVisitor and record all the _binary literal when parsing SQL statement. https://github.com/apache/shardingsphere/blob/e5c8dc7778f79d544935803a7093aee53ea91bf2/sql-parser/dialect/mysql/src/main/antlr4/imports/mysql/DMLStatement.g4#L103-L105

  2. Read the SQL from COM_QUERY by new String(sql, StandardCharsets.ISO_8859_1). Parse SQL and check whether the SQL statement is allowed to prepare. org.apache.shardingsphere.proxy.frontend.mysql.command.query.binary.prepare.MySQLComStmtPrepareChecker

  3. If the SQL is allowed to be prepared, extract _binary literal from SQL and rewrite _binary literal to placeholder ?. Executing SQL as prepared statement.

TeslaCN commented 1 year ago

Hi @yoloz Your PR #21922 did the step 3 of my idea. But checking raw bytes may not a elegant way. What do you think of my idea? Are you interested in fixing this issue?

yoloz commented 1 year ago

Hi @yoloz Your PR #21922 did the step 3 of my idea. But checking raw bytes may not a elegant way. What do you think of my idea? Are you interested in fixing this issue?

@TeslaCN you're right, checking raw bytes not the best way, like MySQL Connector/Python doesn't work,It would be better based on global to provide an api impl

TeslaCN commented 1 year ago

@TeslaCN you're right, checking raw bytes not the best way, like MySQL Connector/Python doesn't work,It would be better based on global to provide an api impl

@yoloz So would you like to fix this? Or I will fix it later.

yoloz commented 1 year ago

@TeslaCN

thanks a lot,

  1. i don't konw how to Implementshardingsphere/sql-parser/dialect/mysql/src/main/antlr4/imports/mysql/DMLStatement.g4;
  2. base on _binary literal just can solve jdbc, but don't work for other, like MySQL Connector/Python doesn't contain any literal

@TeslaCN you're right, checking raw bytes not the best way, like MySQL Connector/Python doesn't work,It would be better based on global to provide an api impl

@yoloz So would you like to fix this? Or I will fix it later.

TeslaCN commented 1 year ago

OK, I will fix this issue. @yoloz Could you describe some details about MySQL Connector/Python? If the Python driver using binary protocol like COM_SEND_LONG_DATA, the Proxy can already handle it. Thanks!

yoloz commented 1 year ago

OK, I will fix this issue. @yoloz Could you describe some details about MySQL Connector/Python? If the Python driver using binary protocol like COM_SEND_LONG_DATA, the Proxy can already handle it. Thanks!

@TeslaCN you can use table structure above and code follows:

import mysql.connector

def convertToBinaryData(filename):
    # Convert digital data to binary format
    with open(filename, 'rb') as file:
        binaryData = file.read()
    return binaryData

def convertToStringData(filename):
    # Convert digital data to string
    with open(filename, "r", encoding="utf-8") as file:
        stringData = file.read()
    return stringData

def insertBLOB(id, bfile, cfile):
    print("inserting bytes to table")
    try:
        connection = mysql.connector.connect(host='127.0.0.1',
                                             port='3307',
                                             database='shadow_db',
                                             user='root',
                                             password='root')

        cursor = connection.cursor()
        sql_insert_blob_query = """ insert into blob_clob(id,b_blob,c_clob) VALUES (%s,%s,%s)"""

        b_data = convertToBinaryData(bfile)
        c_data = convertToStringData(cfile)

        # Convert data into tuple format
        insert_blob_tuple = (id, b_data, c_data)
        result = cursor.executemany(sql_insert_blob_query, insert_blob_tuple)
        connection.commit()
        print("inserted successfully to table", result)

    except mysql.connector.Error as error:
        print("Failed inserting to table {}".format(error))

    finally:
        if connection.is_connected():
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertBLOB(8, "/home/xxx/test/test.png", "/home/xxx/test/test.txt")

Proxy output sql like insert into blob_clob(id,b_blob,c_clob) values(1,'......','........')

TeslaCN commented 1 year ago

Thanks @yoloz ! Your input reminds me that the Python driver does not support server prepared statement. I will take it consideration. Thanks again.

yoloz commented 1 year ago

Thanks @yoloz ! Your input reminds me that the Python driver does not support server prepared statement. I will take it consideration. Thanks again.

welcome,waiting for upgrade 🤝

TeslaCN commented 1 year ago

MySQL Connector/Python invoke API of MySQL Connector/C to escape string.

https://github.com/mysql/mysql-connector-python/blob/6dac476b6cc39dd6b8ad0be142128778b7937b8e/src/mysql_capi.c#L1822-L1940

https://github.com/mysql/mysql-server/blob/a246bad76b9271cb4333634e954040a970222e0a/mysys/charset.cc#L720-L820

TeslaCN commented 1 year ago

So maybe we have to de escape all the string literal in SQL.

yoloz commented 1 year ago

So maybe we have to de escape all the string literal in SQL.

may be trouble to distinguish it, so i find result from server code, a little guess, join sql parse and column type

TeslaCN commented 1 year ago

Maybe we should fix this issue by Test Driven Development. Add some integration test cases.