ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
35.47k stars 6.62k forks source link

When select all columns from distributed table immediately after insertion, the result is incorrect. #66096

Open r33s3n6 opened 5 days ago

r33s3n6 commented 5 days ago

Describe what's wrong When select all columns from distributed table immediately after insertion, the result is incorrect.

How to reproduce

-- sql #47 create table __t_elmbvl__ on cluster default ( c_ld7yut2msy Int32 primary key , c_rfad4lh7 String , c_mt6h0fe Bool , c_qqdw7 Bool , c_xrtu_99k Float64 , c_jo Bool not null , );

-- sql #48 create table t_elmbvl on cluster default as t_elmbvl ENGINE = Distributed(default, ch_main, __t_elmbvl, c_ld7yut2msy);

-- sql #55 alter table t_elmbvl__ on cluster default add column c_fjago Int32;

-- sql #56 alter table __t_elmbvl__ on cluster default add column c_fjago Int32;

-- sql #63 insert into t_elmbvl__ (c_ld7yut2msy, c_rfad4lh7, c_mt6h0fe, c_qqdw7, c_xrtu_99k, c_jo, c_fjago) values (-775545040, 'u0mw', coalesce(false, true), coalesce((NOT NOT(cast( (12509309691728363 <= -22) as Nullable(Bool)))), false), 35.70, coalesce(true, false), -851431940), (1126263007, 'ccbdd', coalesce(false, false), coalesce((NOT NOT(cast( (65.88 = 1937972318395406327) as Nullable(Bool)))), false), 51.48, coalesce((NOT NOT(cast( (cast(null as Nullable(Decimal)) > cast(null as Nullable(Decimal))) as Nullable(Bool)))), false), 65535), (34, 'w', coalesce((NOT NOT(cast( (cast(null as Nullable(Int32)) > cast(null as Nullable(UInt8))) as Nullable(Bool)))), false), coalesce((true) or ((NOT NOT(cast( (2118090269594213976 <= 40.64) as Nullable(Bool))))), true), 72.96, coalesce((NOT NOT(cast( (94.43 > cast(null as Nullable(Decimal))) as Nullable(Bool)))), true), -465198836), (205931387, 'y632x', coalesce(false, true), coalesce((false) or ((false) and ((-152204195 between -360610446 and -945507412))), true), 72.73, coalesce(false, false), 1401197463);

-- sql #73 insert into t_elmbvl__ (c_ld7yut2msy, c_rfad4lh7, c_mt6h0fe, c_qqdw7, c_xrtu_99k, c_jo, c_fjago) values (2147483647, 'bnuxtpb02n', coalesce((NOT NOT(cast( (5873265620921675005 != 33.23) as Nullable(Bool)))), false), coalesce((1580699122 between 1866394741 and -2006854787), true), 30.3, coalesce((904605091 is NULL), true), -2092186939), (-2032679981, 'n05', coalesce((NOT NOT(cast( (6351600532512099348 < 256) as Nullable(Bool)))), false), coalesce((-2028540090 is not NULL), false), 10.70, coalesce((NOT NOT(cast( (cast(null as Nullable(Int16)) <= 1213897034) as Nullable(Bool)))), false), 32), (-1152318580, 'eoqjjlyl4', coalesce(false, true), coalesce((-10281420 between -983787166 and 1378995206), true), -32769.0, coalesce((NOT NOT(cast( (716263129 > -17541) as Nullable(Bool)))), false), -1711766346), (654967805, 'pgq6wiub0h', coalesce(false, false), coalesce((NOT NOT(cast( (4294967295.5 < 1137119677) as Nullable(Bool)))), false), 67.58, coalesce(false, false), -427315728);

-- sql #75 insert into t_elmbvl__ (c_ld7yut2msy, c_rfad4lh7, c_mt6h0fe, c_qqdw7, c_xrtu_99k, c_jo, cfjago) values (-2108699971, 'o', coalesce((NOT NOT(cast( (cast(null as Nullable(Int16)) != 48) as Nullable(Bool)))), false), coalesce(((NOT NOT(cast( (cast(null as Nullable(UInt16)) = 9156229030240177636) as Nullable(Bool))))) and (((NOT NOT(cast( (7844108568336033621 = cast(null as Nullable(Int8))) as Nullable(Bool))))) and ((NOT NOT(cast( (cast(null as Nullable(UInt64)) AND -7) as Nullable(Bool)))))), false), -1.6, coalesce((4 between 719103149 and 278694748), false), -1207250535), (1123238126, 'pm', coalesce((1816137027 between 1219888914 and -809535320), false), coalesce((-1766449752 is NULL), false), 58.34, coalesce(true, false), 32767), (-1261308300, 'xk', coalesce((NOT NOT(cast( (76 <= cast(null as Nullable(Int64))) as Nullable(Bool)))), true), coalesce((-1816368542 is not NULL), false), 2147483647.5, coalesce((NOT NOT(cast( (847637622 AND -786837527) as Nullable(Bool)))), true), -906918576), (-1167758264, 'ni_jim6mbk', coalesce((cast(null as Nullable(Int32)) is NULL), false), coalesce(true, true), 18446744073709551614.1, coalesce(((NOT NOT(cast( (-9062 = cast(null as Nullable(Decimal))) as Nullable(Bool))))) or ((NOT NOT(cast( (cast(null as Nullable(Decimal)) = 6518) as Nullable(Bool))))), false), 292933933);

-- unexpected result select * from t_elmbvl__;

-- after sleep select sleep(1);

-- this result is correct select * from t_elmbvl__;


previous SQL produces:

+--------------+------------+-----------+---------+----------------------+------+-------------+ | c_ld7yut2msy | c_rfad4lh7 | c_mt6h0fe | c_qqdw7 | c_xrtu_99k | c_jo | c_fjago | +--------------+------------+-----------+---------+----------------------+------+-------------+ | -1152318580 | eoqjjlyl4 | 0 | 1 | -32769 | 1 | -1711766346 | | -1261308300 | xk | 1 | 1 | 2147483647.5 | 1 | -906918576 | | -1167758264 | nijim6mbk | 1 | 1 | 18446744073709552000 | 0 | 292933933 | | 1123238126 | pm | 0 | 0 | 58.34 | 1 | 32767 | | -775545040 | u0mw | 0 | 0 | 35.7 | 1 | -851431940 | | 34 | w | 0 | 1 | 72.96 | 1 | -465198836 | | 205931387 | y632x | 0 | 0 | 72.73 | 0 | 1401197463 | | 1126263007 | ccbdd | 0 | 0 | 51.48 | 0 | 65535 | +--------------+------------+-----------+---------+----------------------+------+-------------+ 8 rows in set (0.00 sec) Read 8 rows, 299.00 B in 0.004867 sec., 1643 rows/sec., 59.99 KiB/sec.

+----------+ | sleep(1) | +----------+ | 0 | +----------+ 1 row in set (1.01 sec) Read 1 rows, 1.00 B in 1.000639 sec., 0 rows/sec., 1.00 B/sec.

+--------------+------------+-----------+---------+----------------------+------+-------------+ | c_ld7yut2msy | c_rfad4lh7 | c_mt6h0fe | c_qqdw7 | c_xrtu_99k | c_jo | c_fjago | +--------------+------------+-----------+---------+----------------------+------+-------------+ | -1152318580 | eoqjjlyl4 | 0 | 1 | -32769 | 1 | -1711766346 | | -1261308300 | xk | 1 | 1 | 2147483647.5 | 1 | -906918576 | | -1167758264 | nijim6mbk | 1 | 1 | 18446744073709552000 | 0 | 292933933 | | 1123238126 | pm | 0 | 0 | 58.34 | 1 | 32767 | | -775545040 | u0mw | 0 | 0 | 35.7 | 1 | -851431940 | | 34 | w | 0 | 1 | 72.96 | 1 | -465198836 | | -2108699971 | o | 0 | 0 | -1.6 | 0 | -1207250535 | | 205931387 | y632x | 0 | 0 | 72.73 | 0 | 1401197463 | | 1126263007 | ccbdd | 0 | 0 | 51.48 | 0 | 65535 | | -2032679981 | n05 | 0 | 1 | 10.7 | 0 | 32 | | 654967805 | pgq6wiub0h | 0 | 0 | 67.58 | 0 | -427315728 | | 2147483647 | bnuxtpb02n | 1 | 0 | 30.3 | 0 | -2092186939 | +--------------+------------+-----------+---------+----------------------+------+-------------+ 12 rows in set (0.01 sec) Read 12 rows, 465.00 B in 0.007834 sec., 1531 rows/sec., 57.97 KiB/sec.


**Expected behavior**
The first select statement should produce 12 rows just like the one after `sleep`.

**Additional context**
docker compose config:

version: '3.8' services: clickhouse-01: image: "clickhouse/clickhouse-server:${CHVER:-latest}" user: "101:101" container_name: clickhouse-01 hostname: clickhouse-01 networks: cluster_2S_1R: ipv4_address: 10.0.7.1 volumes:

networks: cluster_2S_1R: driver: bridge ipam: config:

config.xml

<clickhouse replace="true">
    <logger>
        <level>debug</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>3</count>
    </logger>
    <display_name>cluster_2S_1R node 1</display_name>
    <listen_host>0.0.0.0</listen_host>
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>
    <mysql_port>9004</mysql_port>
    <postgresql_port>9005</postgresql_port>
    <user_directories>
        <users_xml>
            <path>users.xml</path>
        </users_xml>
        <local_directory>
            <path>/var/lib/clickhouse/access/</path>
        </local_directory>
    </user_directories>
    <distributed_ddl>
        <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    <remote_servers>
        <default>
            <shard>
                <replica>
                    <host>clickhouse-01</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>clickhouse-02</host>
                    <port>9000</port>
                </replica>
            </shard>
        </default>
    </remote_servers>
    <zookeeper>
        <node>
            <host>clickhouse-keeper-01</host>
            <port>9181</port>
        </node>
        <node>
            <host>clickhouse-keeper-02</host>
            <port>9181</port>
        </node>
        <node>
            <host>clickhouse-keeper-03</host>
            <port>9181</port>
        </node>
    </zookeeper>
    <macros>
        <shard>01</shard>
        <replica>01</replica>
    </macros>
</clickhouse>

users.xml

<?xml version="1.0"?>
<clickhouse replace="true">
    <profiles>
        <default>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>in_order</load_balancing>
            <log_queries>1</log_queries>
            <distributed_product_mode>allow</distributed_product_mode>
        </default>
    </profiles>
    <users>
        <default>
            <access_management>1</access_management>
            <profile>default</profile>
            <networks>
                <ip>::/0</ip>
            </networks>
            <password></password>
            <quota>default</quota>
            <access_management>1</access_management>
            <named_collection_control>1</named_collection_control>
            <show_named_collections>1</show_named_collections>
            <show_named_collections_secrets>1</show_named_collections_secrets>
        </default>
    </users>
    <quotas>
        <default>
            <interval>
                <duration>3600</duration>
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</clickhouse>

keeper_config.xml:

<clickhouse replace="true">
    <logger>
        <level>information</level>
        <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
        <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
        <size>1000M</size>
        <count>3</count>
    </logger>
    <listen_host>::</listen_host>
    <max_thread_pool_size>10000</max_thread_pool_size>
    <keeper_server>
        <tcp_port>9181</tcp_port>
        <server_id>3</server_id>
        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
        <coordination_settings>
            <operation_timeout_ms>10000</operation_timeout_ms>
            <session_timeout_ms>30000</session_timeout_ms>
            <raft_logs_level>information</raft_logs_level>
        </coordination_settings>
        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>clickhouse-keeper-01</hostname>
                <port>9234</port>
            </server>
            <server>
                <id>2</id>
                <hostname>clickhouse-keeper-02</hostname>
                <port>9234</port>
            </server>
            <server>
                <id>3</id>
                <hostname>clickhouse-keeper-03</hostname>
                <port>9234</port>
            </server>
        </raft_configuration>
    </keeper_server>
</clickhouse>

about us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the potential above-mentioned vulnerability that may lead to database logic error.

Algunenano commented 5 days ago

By default insertion on distributed tables is asynchronous, and it happens in the background. If you want a synchronous ack you can enable distributed_foreground_insert

den-crane commented 5 days ago

When select all columns from distributed table immediately after insertion, the result is incorrect.

Do you mean if you select not all columns the result is correct?

r33s3n6 commented 5 days ago

When select all columns from distributed table immediately after insertion, the result is incorrect.

Do you mean if you select not all columns the result is correct?

No, if I select one column, the result is still not correct.

By default insertion on distributed tables is asynchronous, and it happens in the background. If you want a synchronous ack you can enable distributed_foreground_insert

I think this may be the root cause. Thanks!