4paradigm / OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
https://openmldb.ai
Apache License 2.0
1.58k stars 317 forks source link

After inserting data with precompiled SQL, when queried, certain fields have additional null characters ('\x00') appended at the end #3779

Open yht520100 opened 7 months ago

yht520100 commented 7 months ago

Bug Description Service Version: Code from the main branch after version 0.8.4, compiled and deployed on macOS 13.5.1. Client Version: The current project branch python-sdk, accessing the database using the dbapi method.

After inserting data with precompiled SQL, when queried, certain fields have additional null characters ('\x00') appended at the end, as shown in the fourth column: [(1001, '2022-05-01', 'province1', 'city1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 1, 1590738991000), (1002, '2022-05-02', 'province2', 'city2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 2, 1590738992000), (1003, '2022-05-03', 'province3', 'city3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 3, 1590738993000), (1004, '2022-05-04', 'province0', 'city4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 4, 1590738994000)]

Expected Behavior The field contents should not have additional null characters appended and should match the inserted content.

Relation Case

Steps to Reproduce

  1. Use the python-sdk code from the main branch after version 0.8.4.
  2. Run the test cases below to reproduce the corresponding error.
import sys
import os
from openmldb.dbapi import connect
import pytest

from .case_conf import OpenMLDB_ZK_CLUSTER, OpenMLDB_ZK_PATH

class TestOpenmldbDBAPI:
    cursor_without_db = None
    cursor = None
    db_name = "dbapi_test"

    @classmethod
    def setup_class(cls):
        # connect without db to create it
        db = connect(zk=OpenMLDB_ZK_CLUSTER, zkPath=OpenMLDB_ZK_PATH)
        cls.cursor_without_db = db.cursor()
        cls.cursor_without_db.execute(
            f"create database if not exists {cls.db_name};")
        # then we can use the db to connect
        db = connect(database=cls.db_name,
                     zk=OpenMLDB_ZK_CLUSTER,
                     zkPath=OpenMLDB_ZK_PATH)
        cls.cursor = db.cursor()
        cls.cursor.execute("create database if not exists {};".format(cls.db_name))
        assert cls.db_name in cls.cursor.get_databases()

    def recreate_table(self, table, schema):
        if table in self.cursor.get_tables(self.db_name):
            self.cursor.execute("drop table {}".format(table))
        assert table not in self.cursor.get_tables(self.db_name)

        self.cursor.execute("create table {}({}) OPTIONS(partitionnum=1);".format(table, schema))
        assert table in self.cursor.get_tables(self.db_name)

    def test_more_parameterized_query(self):
        table = "test_param_query"
        schema_str = "col1 bigint, col2 date, col3 string, col4 string, col5 int, col6 timestamp, " \
                     "index(key=col3, ts=col1), index(key=col3, ts=col6)"
        self.recreate_table(table, schema_str)
        test_rows = [(1000 + i, '2022-05-0' + str(i), 'province' + str(i % 4),
                      'city' + str(i), i, (1590738990 + i) * 1000)
                     for i in range(1, 5)]
        self.cursor.executemany('insert into {} values (?, ?, ?, ?, ?, ?);'.format(table), test_rows)

        result = self.cursor.execute("select * from {};".format(table)).fetchall()
        result = sorted(list(result), key=lambda x: x[0])
        assert result == test_rows

if __name__ == "__main__":
    sys.exit(pytest.main(["-vv", os.path.abspath(__file__)]))
tobegit3hub commented 7 months ago

It does not work for parameter query like insert into {} values (?, ?, ?, ?, ?, ?). It may be the issue of this sql interface.

vagetablechicken commented 7 months ago

Only python paramterized insert fails, cuz in python sdk level, when insert parameter is tuple: https://github.com/4paradigm/OpenMLDB/blob/bb059de2868478ec056f619042aa42108b9a7a96/python/openmldb_sdk/openmldb/dbapi/dbapi.py#L310-L312 So if you pass date str to it, it'll be calculated, be treated as a string column. The test is one date col and two string, so the str_size = len(date_str) + real_str_size, add more 10 char.

The right way ref dict: https://github.com/4paradigm/OpenMLDB/blob/bb059de2868478ec056f619042aa42108b9a7a96/python/openmldb_sdk/openmldb/dbapi/dbapi.py#L316-L323

I think we should handle None in tuple.

And no paramterized tests of date type before(just insert sql, it'll be build in router native level). https://github.com/4paradigm/OpenMLDB/blob/b1318e804555c0a4ff20f2741a107c41d0995bd8/python/openmldb_sdk/tests/openmldb_client_test.py#L110-L113 We should add more tests about parameteried insert or query.